Hritik Chaudhary

Senior DevOps/SRE Engineer

+91 8859820935

hritikchaudhary016@gmail.com

Bulandshahr, 203401, India

LinkedIn Profile

GitHub Profile

Results-driven Senior DevOps/SRE Engineer with over 4 years of experience designing, implementing, and managing cloud-native solutions. Proven expertise in optimizing CI/CD pipelines, orchestrating containerized applications, and driving infrastructure automation. Committed to enhancing system reliability, security, and scalability through innovative DevOps and SRE practices, focusing on observability, incident management, and continuous improvement. Consistently delivered high-impact projects that improved system uptime, reduced operational costs, and accelerated deployment cycles for enterprise clients.

Core Competencies

Cloud Platforms

AWS Azure GCP DigitalOcean

Containerization & Orchestration

Docker Kubernetes Helm Podman Istio

CI/CD

Jenkins GitHub Actions GitLab CI/CD ArgoCD Flux Spinnaker

Observability & Monitoring

Prometheus Grafana Loki ELK Stack Jaeger Datadog New Relic

Infrastructure as Code

Terraform Ansible CloudFormation Pulumi CDK

Programming

Python Node.js Java Go Bash TypeScript

Version Control

Git GitHub GitLab Bitbucket

Security

OWASP Vault CIS Benchmarks SAST/DAST Snyk Aqua

SRE Practices

SLOs/SLIs Error Budgets Chaos Engineering Incident Management Blameless Postmortems

Professional Experience

Senior DevOps/SRE Engineer, Eptura

March 2024 - Present (Remote)

  • Architected and implemented multi-cloud solutions across AWS, Azure, and GCP platforms, significantly improving system reliability and achieving a 99.99% uptime SLA.
  • Streamlined deployment processes using ArgoCD and Flux, substantially reducing deployment times by 75% and increasing deployment frequency by 3x.
  • Implemented robust disaster recovery strategies using Velero for Kubernetes clusters, achieving an RTO of 15 minutes and an RPO of 5 minutes, a 50% improvement over previous targets.
  • Optimized performance and scalability for Node.js and Java applications in cloud-native environments, leading to a 40% reduction in resource utilization and a 30% improvement in application response times.
  • Introduced advanced observability systems using Prometheus, Grafana, and Jaeger, reducing MTTD and MTTR by over 60% for critical issues.
  • Implemented SLO-based alerting and error budgets, improving incident response times by 50% and reducing false-positive alerts by 80%.
  • Conducted chaos engineering experiments to identify and address system vulnerabilities, enhancing overall system resilience and reducing unplanned downtime by 75%.

DevOps/SRE Engineer, Rapidinnovation

August 2020 - March 2024 (Remote)

  • Spearheaded the adoption of Kubernetes, improving resource utilization by 40% and application scalability by 5x.
  • Designed and implemented a microservices architecture, reducing system complexity, improving maintainability, and accelerating feature delivery by 50%.
  • Automated deployment processes, cutting deployment time by 80% and ensuring high consistency across all environments.
  • Optimized cloud infrastructure costs, achieving an annual reduction of over $100,000 in operational expenses while maintaining performance SLAs.
  • Implemented comprehensive observability solutions using Prometheus, Grafana, and ELK stack, reducing MTTR for critical issues by 70%.
  • Led the adoption of Infrastructure as Code practices using Terraform, increasing deployment accuracy by 90% and reducing provisioning time from days to hours.
  • Designed and implemented a robust backup and disaster recovery strategy, ensuring high data integrity and minimizing data loss windows to under 1 hour.
  • Conducted security audits and implemented best practices, significantly reducing the risk of security breaches and achieving compliance with industry standards such as SOC 2 and ISO 27001.
  • Mentored junior team members on DevOps and SRE practices, fostering a culture of continuous learning and improving team productivity by 30%.
  • Implemented auto-scaling solutions for cloud resources, ensuring 99.9% uptime during peak loads while optimizing costs during low-traffic periods, resulting in a 25% reduction in infrastructure costs.
  • Developed custom scripts and tools to automate routine tasks, saving over 100 hours per month in manual operations across the team and reducing human errors by 95%.
  • Implemented a comprehensive incident management process, reducing the average time to mitigate major incidents by 60% and improving post-incident learning through blameless postmortems.

Notable Projects

Flush - Blockchain-based Web App Deployment

Deployed a blockchain-based web application on AWS EKS while ensuring the security of highly confidential blockchain keys using AWS Secrets Manager. Ensured seamless deployment and managed the secure handling of sensitive information.

TheWearableInternet - Blockchain-based Wearable App Platform

Developed and deployed a blockchain-based wearable app platform on AWS ECS, leveraging various AWS services and technologies. Ensured scalable and reliable deployment of the platform.

Ioffice - Workplace Management Application

Deployed a workplace management application on Azure AKS, utilizing Azure services for enhanced functionality and performance. Implemented cost optimization strategies that resulted in a cost reduction of $100,000 per year while maintaining optimal performance.

Multi-Cloud Kubernetes Deployment Platform

Designed and implemented a unified deployment platform for Kubernetes applications across AWS, Azure, and GCP, enabling seamless multi-cloud deployments and improving overall system resilience. Leveraged Terraform for infrastructure provisioning, ArgoCD for GitOps-based deployments, and Prometheus/Grafana for observability.

Serverless Application Migration

Led the migration of a monolithic Java application to a serverless architecture using AWS Lambda, API Gateway, and DynamoDB. Implemented CI/CD pipelines using AWS CodePipeline and CodeBuild, significantly reducing operational overhead and improving scalability. Achieved a 60% reduction in infrastructure costs and improved application performance by 40%.

Chaos Engineering Framework

Developed a chaos engineering framework using Chaos Mesh and Litmus to systematically test and improve the resilience of a large-scale microservices application deployed on Kubernetes. Conducted experiments simulating various failure scenarios, identifying and addressing critical vulnerabilities. Improved system stability and reduced unplanned downtime by 80%.

Education

B.Tech. in Computer Science and Engineering

ABES INSTITUTE OF TECHNOLOGY, GHAZIABAD (U.P)

8.2 GPA

Languages

Hindi: Native English: Proficient