Observability & MLOps Engineer
Primary Focus
Observability & ML Lifecycle Management
Core Responsibilities
- Design observability stack
- Implement distributed tracing
- Build Grafana dashboards & alerts
- Integrate telemetry across clouds
Core Skills
- Metrics, logs, traces
- Grafana & alerting
- MLOps engineering
- Python/Scripting
Good-to-Have
- Airflow basics
- Multi-cloud observability
Overlap
- Python/Scripting
- Cloud familiarity
Senior Observability Specialist.
Location: [Chennai, Pune, Bangalore]
Employment Type: Fulltime Experience Required: [15-18]
Job Summary:
We are seeking a highly skilled Senior Observability Specialist to design, implement, and manage endtoend observability strategies across cloud and on-premises environments. This role requires expertise in modern monitoring, logging, and tracing tools, ensuring system reliability, performance optimization, and proactive incident detection. The ideal candidate will have experience with Dynatrace, Datadog, and various opensource solutions, including Grafana, Loki, Tempo, Mimir, and Prometheus.
Key Responsibilities:
Monitoring & Dashboarding Architecture:
Centralized Logging & Distributed Tracing:
Observability Strategy & Automation:
Scripting & API Integrations:
DevOps & CI/CD Integration:
Cloud-Native Observability & DevOps Alignment:
Qualifications & Skills Architectural Focus Strong expertise in designing observability frameworks across Dynatrace, Datadog, Grafana, Loki, Tempo, Mimir, and Prometheus. Proficiency in observability architecture, ensuring scalable and reliable monitoring solutions. Advanced experience in scripting with Python or Go for custom API integrations. Deep understanding of DevOps methodologies, CI/CD best practices, and cloud-native observability tools. Experience in microservices architecture and distributed systems monitoring. Ability to troubleshoot bottlenecks, optimize performance, and implement predictive observability insights. Preferred Certifications (Optional): Certified Kubernetes Administrator (CKA) AWS Certified DevOps Engineer Dynatrace Performance Monitoring Certification Prometheus Certified Associate.

Keyskills: mlops grafana aks MLflow Kubeflow tensorflow Terraform Vertex AI aimlops Keras observability Azure Devops azure
Company Website: https://www.citiustech.com/aboutusCitiusTech is a specialist provider of healthcare technology services and solutions to healthcare technology companies, providers, payers and life sciences organizations. With over 4,500+ professionals worldwide, CitiusTech enable...