Senior Cloud-Native & AIOps Engineer with 10+ years architecting, automating, and scaling distributed ML/AI systems across AWS, Azure, and GCP. Currently at Apple (AiML Infrastructure), building production-grade ML platforms serving millions of users.
๐ฏ Uptime: 99.95% | MTTR: โ73% | Cost: โ25% | Scale: Petabyte+ text
- ๐ค Build scalable ML infrastructure with Kubernetes (EKS/GKE)
- ๐ Design observability systems (OpenTelemetry, Datadog, Grafana)
- ๐ฐ Optimize cloud costs through automation & right-sizing
- ๐ Implement Zero Trust security & compliance (SOC2, HIPAA, GDPR)
- ๐ค Mentor teams on MLOps, SRE, and production best practices
AI/ML Infrastructure
- Multi-cloud Kubernetes (EKS/GKE/AKS) โข GPU/TPU optimization โข Model serving โข Feature stores โข Real-time inference
Observability & SRE
- OpenTelemetry โข SLI/SLO monitoring โข 73% MTTR reduction โข Incident response โข Chaos engineering
Cost Optimization
- 25% cost reduction โข Cluster right-sizing โข FinOps โข Resource automation โข Multi-cloud management
Security & Compliance
- Zero Trust โข IAM/RBAC โข Vault โข SOC2/HIPAA/GDPR โข Policy-as-code
| ๐๏ธ 99.95% Uptime | Maintained across distributed AI/ML platforms | | โก 73% MTTR Reduction | Through unified observability & automation | | ๐ฐ 25% Cost Savings | Via intelligent optimization & right-sizing | | ๐ Zero-Downtime | Petabyte-scale migrations with no interruption | | ๐ฅ 60+ Incidents | Led critical production incident resolutions |
- โ๏ธ AWS Cloud Practitioner
- โ Kubernetes Application Developer (CKAD)
- ๐๏ธ HashiCorp Terraform Associate
- ๐ค AI & Machine Learning for Business
- ๐ AWS: Design and Implement Systems
I'm passionate about MLOps architecture, multi-cloud Kubernetes, observability, and production ML systems. Always happy to discuss SRE best practices, cost optimization strategies, and building reliable infrastructure at scale.

