Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Site Reliability Engineer @ Cybage

Home > Devops

 Site Reliability Engineer

Job Description

Key Responsibilities

  • Build and scale observability systems: Design and maintain infrastructure for collecting, aggregating, and analyzing telemetry data (metrics, logs, and traces).
  • Enable actionable insights: Develop dashboards, alerts, and visualizations that turn raw data into clear, meaningful information for engineers, SREs, and business stakeholders.
  • Collaborate across teams: Partner with engineering, operations, and SRE teams to define SLIs/SLOs and improve visibility into system performance and health.
  • Drive best practices: Advocate for and support consistent instrumentation, effective alerting, and strong observability practices across engineering teams.
  • Optimize systems and tools: Continuously assess performance, usage, and cost of observability tools, identifying opportunities for improvement and efficiency.
  • Automate: Engineer capabilities that will drive the adoption of SRE principles and best practices into what is deployed within the Nexxen environment.
  • Improve: In collaboration with engineering teams develop plans to improve the reliability of applications and infrastructure and assist these teams with the engineering of these improvements.
  • Support incident response: Participate in and help improve the incident response process, reducing MTTR and contributing to post-incident reviews and root cause analysis.

What Were Looking For

Technical Skills

  • Programming experience in languages like Go, Python, Java, or Node.js. Able to contribute tools and advise on application-level instrumentation improvements.
  • Observability tooling expertise within these tools:
  • LGTM (Loki, Grafana, Tempo, Mimr)
  • Datadog
  • Cloudwatch
  • Prometheus
  • Pagerduty
  • ClickStack
  • VictoriaMetrics
  • Groundcover
  • Libre
  • Zabbix
  • Cloud experience with AWS and services like EC2, EKS, ECS, VPC networking
  • Containers & orchestration: Familiarity with Docker and Kubernetes.
  • Infrastructure as Code & automation: Experience with tools like Terraform, Ansible, Chef, or SCCM to manage observability infrastructure at scale.
  • Linux systems knowledge: Strong understanding of Linux, shell scripting, and the storage/networking stack.
  • Tracing: Deep understanding of tracing technology and OpenTelemetry
  • SRE Practices: SLIs, SLOs, Error Budgets, and Failure Domains

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Cybage
Location(s): Pune

+ View Contactajax loader


Keyskills:   Infrastructure As Code Site Reliability Engineering Python Automation Aws Cloud Docker Linux Prometheus Grafana Kubernetes

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Senior Staff Engineer

  • Nagarro
  • 10 - 12 years
  • Bengaluru
  • 2 days ago
₹ Not Disclosed

Aws Devops Engineer Eks (jaipur Onsite)

  • Toolify
  • 4 - 6 years
  • Jaipur
  • 2 days ago
₹ 12-15 Lacs P.A.

Staff Engineer ( Cloud DBA)

  • Nagarro
  • 6 - 10 years
  • Bengaluru
  • 2 days ago
₹ Not Disclosed

DevOps Engineer For Airport Infrastructure/Data Center

  • Three D Integrated
  • 4 - 5 years
  • Delhi, NCR
  • 2 days ago
₹ Not Disclosed

Cybage

About Cybage:Founded in 1995, Cybage Software Pvt. Ltd., a technology consulting organization is a leader in the hi-tech and outsourced product engineering space. We are a valued partner to technology startups, mid-size companies, and Fortune 500 corporations alike. Our solutions are focused on mode...