Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Site Reliability Engineer - Cloud Platforms @ Agivant Technologies

Home > Devops

 Site Reliability Engineer - Cloud Platforms

Job Description

We are looking for a highly skilled Site Reliability Engineer (SRE) with strong engineering and architectural expertise to design, implement, and manage large-scale, mission-critical infrastructure across multiple data centers and cloud providers. As an SRE, you will be responsible for architecting and optimizing our global infrastructure, enabling development teams to roll out new features efficiently while maintaining high availability and reliability. You will be hands-on with automation, performance tuning, infrastructure scalability, and cloud-native technologies to ensure a seamless user experience for millions of customers. Key Responsibilities : 1. Architect and implement highly scalable, fault-tolerant, and distributed systems across multi-cloud (OCI, AWS, GCP) and on-premise environments using modern DevOps and SRE principles. 2. Design and deploy next-generation cloud infrastructure with a strong focus on automation, self-healing systems, and performance optimization. Develop and maintain infrastructure-as-code (IaC) using Terraform and configuration management tools such as Ansible and Puppet for automated provisioning and orchestration. 3. Build and optimize containerized environments using Kubernetes and Docker for seamless deployment and scaling. 4. Drive performance, scalability, and security improvements across our cloud and on-prem infrastructure, ensuring high availability and disaster recovery capabilities. Monitor, troubleshoot, and resolve complex system issues by implementing advanced observability solutions, logging, and real-time monitoring frameworks. 5. Develop and enforce SRE best practices, including SLI/SLO definition, capacity planning, and incident management strategies. 6. Eliminate toil and automate repetitive tasks using scripting languages such as Python, Golang, or Shell scripting to improve operational efficiency. 7. Collaborate closely with engineering, architecture, and security teams to improve system resiliency, optimize application performance, and streamline CI/CD workflows. Lead the transition of legacy systems to modern, cloud-native architectures, advocating for DevOps and infrastructure automation. 8. Participate in 24/7 on-call rotations, ensuring rapid response to critical incidents and driving post-mortem analysis for continuous improvement. Requirements : 1. 7+ years of hands-on experience in a Site Reliability Engineering (SRE) role, with a strong focus on designing, implementing, and managing cloud-native infrastructure. Proficient with any cloud platform (preferably OCI) -not just operational experience but actual design and implementation expertise. 2. Proven experience in building, deploying, and optimizing infrastructure-as-code (IaC) using Terraform. 3. Strong automation mindset with proficiency in Ansible, Puppet, or other configuration management tools. 4. Hands-on experience with container orchestration using Kubernetes, Docker, and microservices architecture. 5. Advanced scripting and automation skills in Python, Golang, or Shell scripting to eliminate manual operations. 6. Working knowledge of load balancing technologies (HAProxy, Nginx, F5, Varnish, dnsdist) and web servers (Apache, Nginx). 7. Strong understanding of networking, distributed systems, and observability tools (Prometheus, Grafana, ELK stack, Datadog). 8. Experience in designing and implementing highly available, scalable, and secure architectures across cloud and hybrid environments. 9. AWS and/or GCP certifications are a plus but not required. 10. This is not a support-focused role-we are looking for engineers who have built, deployed, and optimized complex distributed systems from the ground up.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Agivant Technologies
Location(s): Pune

+ View Contactajax loader


Keyskills:   Site Reliability Engineering DevOps Azure Site Reliability Docker Cloud Services Shell Scripting Ansible Oracle Integration Cloud AWS IAC Terraform Kubernetes

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Site Reliability Engineer

  • Empowering Digital
  • 5 - 10 years
  • Hyderabad
  • 26 minutes ago
₹ Not Disclosed

DevOps Engineer

  • Leading Client
  • 2 - 6 years
  • Bengaluru
  • 6 hours ago
₹ Not Disclosed

Devops Engineer

  • Cynet Systems
  • 2 - 5 years
  • Bengaluru
  • 6 hours ago
₹ Not Disclosed

Devops Engineer-Hyderabad-Immediate

  • Innova Solutions
  • 6 - 10 years
  • Hyderabad
  • 6 hours ago
₹ Not Disclosed

Agivant Technologies

Company DetailsAgivant Technologies