Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Site Reliability Engineer @ Qentelli

Home > Technology / IT

 Site Reliability Engineer

Job Description

Job Summary

We are seeking an experienced Site Reliability Engineer (SRE) to ensure the availability, reliability, scalability, security, and performance of cloud-native platforms running on IBM Cloud and Google Cloud Platform (GCP).
The role focuses on Kubernetes operations, observability and KPIs, PostgreSQL reliability, event-driven systems (MQTT), security best practices, automation, and incident management.

This position requires a strong SRE mindset, production ownership, and close collaboration with development, QA, DevOps, and platform teams.


Roles and Responsibilities

  • Own the reliability, availability, and performance of mission-critical production systems across IBM Cloud and GCP.
  • Operate, monitor, and scale Kubernetes platforms (IKS / GKE), including deployments, upgrades, node pool management, and capacity planning.
  • Design, implement, and maintain monitoring, alerting, logging, and dashboards using cloud-native and open-source observability tools.
  • Define, measure, and continuously improve SLIs, SLOs, error budgets, and service KPIs.
  • Participate in on-call rotations, lead incident response, perform root cause analysis (RCA), and drive post-incident reviews with clear corrective actions.
  • Proactively analyse system performance, traffic patterns, and failure trends to prevent outages and reduce MTTR.
  • Manage and support PostgreSQL databases in production, including backups, restores, replication, failover, upgrades, and performance tuning.
  • Support event-driven architectures and MQTT-based messaging systems, ensuring message reliability, scalability, and low latency.
  • Implement and enforce cloud and Kubernetes security best practices, including IAM, RBAC, secrets management, certificate lifecycle, and network security.
  • Automate operational, reliability, and maintenance tasks using Python and Shell scripting.
  • Support CI/CD pipelines, enabling safe release strategies such as blue-green and canary deployments.
  • Troubleshoot build, deployment, application, and infrastructure failures and drive long-term reliability improvements.
  • Monitor infrastructure utilization and cloud costs, and recommend performance and cost-optimization measures.
  • Collaborate with development, QA, DevOps, and platform teams to improve delivery velocity and operational excellence.
  • Maintain clear runbooks, SOPs, and operational documentation for incident handling and platform operations.

Required Skills & Qualifications

  • Strong hands-on experience operating Kubernetes in production environments
  • Proven expertise in monitoring, alerting, observability, and SRE KPIs
  • Hands-on experience supporting PostgreSQL databases in production
  • Knowledge of event-driven architectures and MQTT
  • Solid understanding of cloud security principles and best practices
  • Strong automation and scripting skills (Python, Shell)
  • Experience working with IBM Cloud and/or Google Cloud Platform (GCP)
  • Ability to handle production incidents, perform RCA, and operate in a reliability-focused environment

Preferred Skills

  • Experience with Prometheus, Grafana, OpenTelemetry, or similar observability platforms
  • Infrastructure as Code or GitOps experience (Terraform, Kustomize, Argo CD)
  • Kubernetes, Cloud, or SRE-related certifications
  • Experience with cloud cost optimization (FinOps) practices
  • Exposure to multi-cloud or hybrid cloud environments

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Project & Program Management
Role Category: Technology / IT
Role: Technology / IT - Other
Employement Type: Full time

Contact Details:

Company: Qentelli
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   Site Reliability Engineering Ci/Cd Kubernetes Ibm Cloud GCP Shell Scripting Sre Python

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Fullstack Engineer

  • Athena Tech
  • 7 - 9 years
  • India
  • 22 days ago
₹ 12-14.4 Lacs P.A.

Senior Software Engineer - AS400 iSeries Developer

  • Crescendo Global
  • 5 - 8 years
  • Pune
  • 23 days ago
₹ Not Disclosed

Site Reliability Engineer- SRE

  • Cognizant
  • 7 - 12 years
  • Hyderabad
  • 1 month ago
₹ Not Disclosed

Salesforce B2B Commerce - Support & Developer Engineer/Lead

  • G4S
  • 7 - 12 years
  • Hyderabad
  • 1 month ago
₹ 10-20 Lacs P.A.

Qentelli

Qentelli at Glance:- Qentelli is a technology company that accelerates digital transformation and cloud transformation journeys through DevOps, Automation, Agile transformation, AI and Deep learning. Forrester recently recognized our efforts at using AI and ML in the augme...