Career Profile

Welcome!

I'm Keith Kain, a Lead Platform Engineer with 10+ years of experience building and scaling cloud-native infrastructure, driving site reliability practices, and improving developer experience across regulated financial environments. I lead cross-functional efforts that own the full platform lifecycle — from designing resilient architectures to automating away operational toil — while also writing and shipping production software in Go, Python, TypeScript, and Java.

My focus is on making infrastructure a competitive advantage: self-healing systems that reduce incident burden, CI/CD pipelines that ship safely and fast, and infrastructure-as-code that teams can reason about and extend. I've led modernization efforts across application ecosystems, reduced compliance findings by 80% through automation, and personally developed and maintained 20+ containerized services running in production.

This resume itself demonstrates my cloud expertise: it's built as a serverless application hosted on AWS using S3, Lambda, DynamoDB, and CloudFront, deployed through a CI/CD pipeline with GitHub Actions and secured following AWS best practices.

See the complete architecture and implementation details in my technical blog post.

Skills & Proficiency

Cloud Infrastructure & Architecture

Design and operate resilient, cost-optimized cloud architectures on AWS — from serverless workloads (Lambda, API Gateway, DynamoDB) to containerized platforms (ECS/EKS). Define infrastructure-as-code with Terraform and CloudFormation, embedding compliance-as-code practices that keep environments auditable and reproducible.

AWS Certified DevOps Engineer Professional, AWS Certified Solutions Architect Associate, and AWS Certified AI Practitioner with proven experience in production environments.

Development & Engineering

Build backend services, middleware, and automation in Go, Python, TypeScript, and Java. Design RESTful APIs, microservices, and serverless functions that support critical business operations. Develop data processing pipelines — including AWS Glue-based migrations — that connect and modernize diverse systems. Apply test-driven development, code review, and continuous refactoring as standard practice.

Technical Leadership & Collaboration

Lead cross-functional engineering efforts spanning platform, application, and data teams. Drive architectural decisions, mentor engineers on infrastructure best practices, and champion operational excellence. Leverage AI-assisted workflows (Claude, Windsurf) to accelerate delivery and standardize practices. Translate business requirements into scalable platform strategies and communicate technical tradeoffs to stakeholders at all levels.

Container Orchestration & Microservices

Manage production Kubernetes (EKS) and ECS clusters supporting high-traffic financial applications across multiple environments. Implement service mesh architectures, automated scaling policies, and container security hardening. Led decomposition of monolithic applications into containerized microservices, improving deployment velocity and fault isolation.

CI/CD & Developer Experience

Build and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI) with integrated security scanning, infrastructure validation, and automated testing — enabling teams to ship safely with confidence. Created self-service tooling that cut infrastructure provisioning from days to minutes. Practiced GitOps workflows and modern deployment strategies including blue/green and canary releases using Ansible, AWS SAM, and Terraform.

Site Reliability & Observability

Drive SRE culture through SLOs/SLIs, error budgeting, blameless postmortems, and automated remediation. Build observability stacks using Grafana, Prometheus, New Relic, Datadog, and CloudWatch with PagerDuty alerting. Implement distributed tracing (AWS X-Ray) to surface bottlenecks. Lead disaster recovery exercises and author runbooks that keep mission-critical financial systems highly available.

Security & Compliance

Apply defense-in-depth security: least-privilege IAM, network segmentation, automated vulnerability scanning, and auto-remediation workflows that reduced compliance findings by 80%. Manage multi-account governance with AWS Organizations, Control Tower, and Security Hub. Operate within SOC2 and PCI-DSS frameworks in regulated financial environments.

Experiences

Lead Platform Engineer - Manager

2026-Present
Capital One - Commercial Money Movement/Bank Tech

- Reduced vulnerability and compliance findings by 80% by designing and implementing automated remediation workflows, eliminating manual triage and accelerating time-to-resolution across the application portfolio.

- Led cross-team effort to build AWS Glue Pipelines for critical complex data migration from legacy systems to modern platforms, enabling retirement of costly legacy infrastructure while ensuring zero data loss.

- Spearheaded migration and modernization initiatives across the application ecosystem, driving adoption of containerized architectures and cloud-native patterns to improve reliability, reduce operational overhead, and position teams for long-term scalability.

- Established SRE practices including SLOs/SLIs, error budgeting, and blameless postmortems. Built comprehensive observability solutions using Grafana, New Relic, Datadog, and distributed tracing that enhanced rapid incident response in a regulated financial environment.

- Implemented AI-assisted engineering workflows using Claude and Windsurf to accelerate development velocity, improve code quality, and standardize engineering practices across the team.

Senior Platform Engineer (Cloud/DevOps) - Principal Associate

2022-2026
Capital One - Commercial Money Movement/Bank Tech

- Maintained and developed 20+ application components containerized across multiple Kubernetes and ECS clusters, written in Java, Go, and Python, supporting the Intellix money movement platform serving thousands of commercial banking clients.

- Architected secure CI/CD pipelines with automated testing, security scanning, and compliance validation for banking applications processing sensitive financial data. Implemented infrastructure-as-code using Terraform and CloudFormation with automated drift detection.

- Optimized cloud infrastructure costs through implementation of right-sizing, reserved instances, and serverless architectures, delivering measurable savings while maintaining performance SLAs.

- Enhanced incident recovery capabilities through development of self-healing automation using EventBridge, Lambda, and Step Functions. Authored comprehensive runbooks and implemented auto-remediation for common failure scenarios in mission-critical financial systems.

- Led migration of legacy applications to modern cloud architecture, converting monolithic applications to microservices using containers (ECS/Fargate) and serverless technologies (Lambda, API Gateway), significantly reducing deployment time while improving scalability and reliability.

Systems/Software Engineer

2022
Carnegie Mellon University - College of Engineering

- Implemented DevOps practices for the research computing infrastructure, introducing Git and GitLab CI. Reduced deployment cycles while improving quality and reliability.

- Implemented a self-populating inventory system for hardware and software assets using automated discovery tools integrated with the machine build pipeline.

- Modernized middleware code and improved processes by refactoring legacy components, implementing version control, and establishing automated testing. These improvements reduced technical debt and enhanced system reliability

- Created infrastructure-as-code templates using Ansible and PowerShell DSC that standardized environment creation and ensured consistent configurations across development, testing, and production environments.

- Evaluated and documented monitoring requirements for research computing infrastructure. Collaborated with stakeholders to identify critical metrics and developed a monitoring strategy aligned with department objectives and resource constraints.

Systems Administrator

2019 - 2022
Carnegie Mellon University - College of Engineering

- Managed enterprise Windows infrastructure using industry-standard toolsets including MDT, SCCM, WSUS, and PowerShell for a large educational environment, ensuring reliable and secure computing resources.

- Orchestrated software deployment and licensing for complex engineering software packages, streamlining distribution of specialized applications to academic departments.

- Administered Linux servers and endpoints running RHEL using Red Hat Satellite and Puppet for configuration management and automated updates.

- Implemented MacOS management solution using Jamf Pro with Automated Device Enrollment workflow, creating a unified device management strategy across multiple platforms.

- Developed automation solutions using PowerShell and Python that eliminated repetitive tasks and improved system reliability across the university's computing environment.

IT Generalist/Systems Administrator

2017-2019
Slippery Rock University

- Managed large-scale endpoint environment consisting of 3000+ client systems across Windows and Mac platforms using advanced RMM/MDM solutions for centralized system management.

- Led cross-functional migration project to upgrade 2000+ endpoints from Windows 7 to Windows 10, coordinating with academic departments to minimize disruption and ensure compatibility with instructional applications.

- Developed custom automation solutions using PowerShell and Python that addressed specific institutional challenges and streamlined IT support processes across the university.

Assistant to the Director of Educational Technology

2011-2017
Reynolds School District

- Modernized system deployment processes by implementing efficient Windows build automation workflow with MDT, WDS, and PowerShell, replacing legacy imaging methods and significantly improving deployment efficiency.

- Managed identity and access systems for student and staff accounts across multiple educational platforms, ensuring appropriate access while maintaining security and compliance with educational privacy requirements.

- Provided technical support and troubleshooting for classroom IT and AV systems, ensuring reliable technology resources for educational instruction in a K-12 environment.

Projects

Serverless Resume Platform

Designed and implemented this serverless resume website using AWS S3, CloudFront, Lambda, and API Gateway. Configured CI/CD pipeline with GitHub Actions for automated testing and deployment. Infrastructure managed as code with Terraform.

Streaming Data Pipeline for Log Analysis

Architected and implemented a real-time log analysis solution using AWS Kinesis Data Streams and Lambda functions. Designed the pipeline to process and analyze application logs at scale, enabling proactive monitoring and alerting. Enhanced observability with custom dashboards for visualizing system health metrics and performance trends.

Multi-Region Disaster Recovery Solution

Designed and implemented a cross-region disaster recovery strategy, taking into account the unique requirements of a regulated financial environment. Utilized AWS Route 53 health checks, S3 cross-region replication, and DynamoDB global tables to ensure data consistency. Created automated failover procedures with AWS Lambda that reduced manual intervention during recovery scenarios.