Job Description
About Client
Hiring for One of the Most Prestigious Multinational Corporations!
Job Description
Job Title : Openshift SRE
Qualification : BE / B.tech
Relevant Experience : 8 to 15 Years
Must Have Skills :
- 8+ years of overall experience in roles such as Site Reliability Engineering, DevOps, or Linux Systems Engineering.
- 5+ years of hands-on, intensive experience administering, automating, and troubleshooting Red Hat OpenShift (OCP 4.x preferred) in large-scale production environments.
- Proven experience in a senior or lead engineering role, demonstrating ownership of complex projects and mentorship of others.
Technical Skills
- Expert-Level OpenShift: Deep, authoritative knowledge of OCP installation (IPI/UPI), upgrades, cluster administration, node management, and disaster recovery.
- Kubernetes Mastery: A fundamental and deep understanding of Kubernetes architecture and components (etcd, kube-apiserver, scheduler, etc.) and Operators (OLM).
- Infrastructure as Code (IaC): Strong proficiency with Ansible and Terraform for automating infrastructure provisioning and configuration management.
- Programming/Scripting: Advanced scripting and software development skills in Python or Go, as well as Bash.
- Observability: Hands-on experience building and managing monitoring and logging solutions (e.g., Prometheus, Grafana, Thanos, Alertmanager, ELK Stack, Splunk, Fluentd/Vector/OTEL).
- CI/CD & GitOps: Expertise with CI/CD tooling (e.g., Tekton ,Jenkins, GitLab CI, ArgoCD, GitHub Actions).
- Core Infrastructure: Strong proficiency in Linux/RHEL administration, networking (SDN, OVS, routing, firewalls, load balancer), and storage (Ceph, NFS, block storage, Object).
Good to Have Skills :
- Analytical Mindset: Exceptional problem-solving skills with the ability to diagnose complex technical issues across multiple platform layers.
- Ownership and Accountability: A strong sense of ownership and the drive to see issues through to resolution.
- Communication: Excellent communication and interpersonal skills, capable of explaining complex topics to both technical and non-technical audiences.
- Composure: Ability to remain calm and effective under pressure during critical incidents.
On-Call
- Willingness to participate in a 24x7 on-call rotation to handle critical platform incidents.
Roles and Responsibilities :
- Define and Uphold Reliability Standards: Establish and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets for the OpenShift platform and its core services.
- Automate Everything: Design, build, and maintain robust automation to handle the full lifecycle of OpenShift clusters, including provisioning, upgrades, patching, scaling, and disaster recovery.
- Reduce Toil: Proactively identify and eliminate manual, repetitive operational work by developing and maintaining automation scripts (Python, Go, Bash) and Ansible playbooks.
- Incident Response and Root Cause Analysis: Lead high-severity incident response and conduct deep, blameless post-mortems to identify and implement permanent solutions to prevent recurrence.
- Proactive Health Management: Develop and implement automated health checks and self-healing capabilities to ensure cluster and application resilience.
- Subject Matter Expertise: Serve as the top-tier technical authority for OpenShift Container Platform architecture, networking (OVN-Kubernetes, SDN), load balancing, cross cluster management, storage (OpenShift Data Foundation/Ceph), and security.
- Observability: Architect and manage a comprehensive observability stack (e.g., Prometheus, Grafana, ELK/Fluentd) to provide deep insights into platform and application performance.
- CI/CD and GitOps: Engineer and optimize CI/CD pipelines for both platform components and tenant applications, championing GitOps principles for declarative configuration management.
- Capacity and Performance: Conduct advanced performance tuning, load testing, and capacity planning to ensure the platform can meet future demand.
Location : Bangalore/Hyderabad/Chennai/Mumbai/Pune/Delhi
CTC Range : As per market standards
Notice period : Immediate-90 days
Shift Timing : General Shift
Mode of Interview : Virtual
Mode of Work : Work from office
Bhuvaneshwari S
Senior Specialist
Black and White outsourcing Pvt Ltd
Bangalore, Karnataka,INDIA.
bh**********i@bl*******e.in | www.blackwhite.in
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: IT & Information Security
Role Category: IT Infrastructure Services
Role: Configuration and Deployment Management
Employement Type: Full time
Contact Details:
Company: Black white Business
Location(s): Hyderabad
Keyskills:
Terraform
Docker
Sre
Openshift
Devops
Jenkins
Shell Scripting
Ansible
Site Reliability Engineering
Bash
Yaml
Kubernetes