Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Principal Service Reliability Engineer @ Oracle

Home > Devops

 Principal Service Reliability Engineer

Job Description

Summary


Own and scale mission-critical ERP/SaaS services while building intelligent, cloud-native capabilities. This role requires a SRE mindset combined with AI/ML expertise and strong application engineering skills across public and private cloud environments.

Key Responsibilities


- End-to-end service ownership: design for telemetry, security, resiliency, scalability, and performance; lead sizing/architecture; drive service health reviews and process simplification.


- Incident management and prevention: lead postmortems/RCAs, coordinate fixes, define repair items, and implement data-driven prevention and continuous improvement.


- AI/ML and GenAI delivery: design and integrate solutions with LLMs, RAG, agentic workflows, and conversational AI; build low-latency model serving and retraining pipelines.


- Application engineering: develop performant microservices for distributed, containerized, cloud-native systems.


- Automation: eliminate toil by automating operational workflows, recovery procedures, code delivery, and configuration management; build internal tools and reusable scripts/services to accelerate delivery and reduce errors.


- Observability: define and implement monitoring, logging, alerting, and tracing strategies; establish SLOs/SLIs/error budgets; improve diagnostics and performance visibility for rapid triage.


- Cross-functional collaboration: partner with product, operations, and data teams to translate requirements into secure, scalable solutions; communicate effectively with technical and non-technical stakeholders.

Minimum Qualifications


- BS/MS in Computer Science or related field; 10+ years of software engineering in cloud environments.


- Strong in distributed systems/microservices using java / python; SQL/data modeling; python for AI/automation.


- SRE/DevOps expertise: systems and networking fundamentals, application security, observability, performance analysis, and incident response.


- Proven SDLC excellence: code quality, reviews, version control, CI/CD, testing, and release engineering.


- Excellent written and verbal communication; English fluency.

Preferred/Technical Skills


- AI/ML/GenAI: experience with foundational models, RAG, agentic architectures; model deployment, optimization, monitoring, and retraining.


- Cloud and containers: experience with containerization, orchestration, and resilient, fault-tolerant microservices.


- Observability: hands-on experience designing dashboards, alerts, traces, logs, and metrics; defining SLOs/SLIs and error budgets; on-call readiness and runbook quality.


- Operations: performance tuning across java / python and SQL for large-scale enterprise applications; strong Linux/Unix expertise; capacity planning and reliability reviews.


- Automation and scripting: proficiency in scripting to automate operational workflows, build tooling, and CI/CD tasks (e.g., shell scripting, python, configuration-as-code, task runners).


- Familiarity with enterprise ERP applications and standard DevOps tooling and practices.

Career Level - IC4

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Oracle
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   Unix ERP Automation Linux Networking Data modeling Shell scripting SDLC SQL Python

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Cloud Platform Engineer

  • Accenture
  • 15 - 20 years
  • Hyderabad
  • 2 days ago
₹ Not Disclosed

Associate DevOps Engineer

  • NTT DATA
  • 0 - 2 years
  • Pune
  • 2 days ago
₹ Not Disclosed

MLOPS Engineer

  • Cognizant
  • 8 - 10 years
  • Pune
  • 3 days ago
₹ Not Disclosed

AWS DevOps Engineer

  • Cognizant
  • 4 - 6 years
  • Hyderabad
  • 3 days ago
₹ Not Disclosed

Oracle

Client provides information and communications technology (ICT) solutions. It offers a range of computing devices, storage devices, servers, networking systems, electronic devices, and allied products. The company also provides application, business transformation, enterprise and cybersecurity, n...