Primary Responsibilities:
Undergraduate degree or equivalent experience.
Undergraduate degree or equivalent experience
Overall, 10-12 years of experience in IT industry across entire SDLC
Proven work experience as a Site Reliability Engineer or similar role
5+ years of experience in integrating monitoring and alerting into cloud software solutions
3+ years of coding experience with one or more of the follow languages Java, C#, C/C++, Go, Python, Perl, PowerShell or JavaScript with a willingness and ability to learn new ones
2+ years of experience building and programmatically consuming REST APIs
3+ years of experience in Splunk / Dynatrace / DataDog/Grafana/ Telemetry or similar for monitoring tools
Experience with programmatic interaction with a relational database SQL Server/MySQL/PostgreSQL
Experience planning and supporting 99.999% availability against critical applications in production
Solid understanding of engineering fundamentals: unit testing, performance testing, code reviews, telemetry, agile and DevOps
Defining and setting up best industry alert and monitoring practices across line of business and design/architect efficient monitoring dashboards on Splunk/Dynatrace /Grafana common for all applications/products across line of business
Experience with any database.
Knowledge of any scripting or programming language.
Experience in operations support for any application.
ServiceNow experience.
Participating in 5-9 program and other peak season readiness initiatives and collaboration with application teams evaluating applications from resiliency, availability, and reliability perspective
Act as a gatekeeper for changes rolling into production
Embrace continuous learning of engineering practices to ensure industry best practices and technology adoption, including DevOps, Cloud and Agile thinking
Tech debt reduction/Tech transformation including opensource/inner source adoption, Cloud adoption, HCP assessment and adoption
Improve processes/runbooks and lead automation efforts of any manual items around support cutting down manual toil
Participate in on-call rotation
Improve operational tooling, frameworks, perform chaos engineering activities
Respond to platform emergencies, alerts, and escalations from Customer Support

Keyskills: Site Reliability Engineering Disaster Recovery Dynatrace Splunk SLI Performance Testing SLA
About: OptumInsight India Pvt Ltd, a UnitedHealth group company is a leading health services and innovation company dedicated to help make the health system work better for everyone. With more than 115,000 people worldwide, Optum combines technology, data and expertise to improve the delivery, ...