Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Mainframe-SRE-Z/os @ Cognizant

Home > IT Infrastructure Services

 Mainframe-SRE-Z/os

Job Description

Role & responsibilities

Job Title: Mainframe Site Reliability Engineer (SRE)

Location: Pune/Hyd

Employment Type: Full-Time

---

About the Role

We are seeking a visionary Mainframe Site Reliability Engineer (SRE) to redefine the reliability, automation, and efficiency of our mission-critical z/OS systems. This role combines deep mainframe expertise with cutting-edge SRE practices, focusing on innovations in observability, AI-driven operations, and DevOps integration to transform legacy workflows into modern, self-healing systems. You will drive initiatives to eliminate manual toil, optimize performance, and ensure the platforms resilience aligns with business-critical service level objectives (SLOs).

---

Key Responsibilities

1. SRE-Centric Innovation & Automation

- Automation Engineering:

- Design and deploy Infrastructure-as-Code (IaC) solutions using Ansible, Zowe CLI, and z/OSMF workflows to automate system provisioning, configuration management, and recovery processes.

- Develop self-healing workflows for critical subsystems (CICS, Db2, IMS) to auto-resolve incidents like JVM failures or transaction bottlenecks.

- Convert legacy operational scripts (REXX, NCL) into modern, version-controlled pipelines integrated with Git and CI/CD tools like Jenkins.

- AI-Driven Observability:

- Implement predictive analytics tools (e.g., IBM Watson AIOps, Splunk ITSI) to detect anomalies in system metrics, logs, and message queues.

- Build dashboards using Grafana or Prometheus to visualize the Four Golden Signals (latency, traffic, errors, saturation) across mainframe workloads.

- Centralize alert management to reduce noise and prioritize actionable alerts using AI-driven correlation.

2. DevOps Integration & Modernization

- CI/CD for Mainframe:

- Streamline software delivery pipelines for COBOL/PL/I applications using IBM Dependency-Based Build (DBB) and UrbanCode Deploy (UCD).

- Integrate mainframe SDLC processes with enterprise Git repositories (GitHub, GitLab) to enable collaborative development and audit trails.

- Enable automated testing and phased rollouts for z/OS middleware updates.

- Performance & Capacity Engineering:

- Optimize CPU/MIPS utilization through runtime tuning (e.g., CICS Threadsafe, AT-TLS offloading) to reduce software licensing costs.

- Forecast capacity demands using historical SMF/RMF data and propose dynamic hardware scaling strategies.

- Conduct load testing for batch and OLTP workloads to validate system limits and error budgets.

3. Incident Management & Reliability

- Lead blameless postmortems for critical incidents, focusing on root cause analysis (RCA) and preventive actions (e.g., monitoring gaps, automation fixes).

- Reduce MTTR by implementing automated incident response playbooks (e.g., auto-restart failed subsystems, reroute traffic).

- Maintain 24/7 operational readiness through on-call rotations and cross-training in z/OS, CICS, Db2, and storage management.

4. Platform Hardening & Knowledge Sharing

- Enforce security best practices (RACF, TLS) and vulnerability remediation for z/OS and middleware.

- Develop reusable workbooks and runbooks to document system configurations, troubleshooting steps, and automation workflows.

- Mentor teams on SRE principles, fostering a T-shaped skill model (deep mainframe + DevOps/Agile practices).

5. Batch Optimization & Resource Management

- Design dynamic resource allocation strategies (e.g., WLM policies, enclaves) to prioritize critical batch jobs and minimize contention for CPU, memory, and I/O resources.

- Implement parallel processing (e.g., multi-task JCL, SYSAFF routing) to reduce runtime and avoid bottlenecks in long-running batch cycles.

- Streamline job dependencies using graph-based scheduling tools (e.g., IWS, CA7, Control-M) to eliminate idle wait times between interdependent jobs.

6. Proactive Batch Health Monitoring :

- Develop automated checks for batch job SLAs, including real-time alerts for delays, resource starvation, or dataset contention.

- Integrate predictive analytics (e.g., historical SMF data analysis) to forecast and mitigate delays caused by seasonal peaks or data volume spikes.

---

Required Skills

- Technical Expertise:

- xx+ years in z/OS system programming, performance tuning, or infrastructure support.

- Proficiency in JCL, REXX, Python, and mainframe automation tools (IBM Z System Automation, Broadcom OPS/MVS).

- Hands-on experience with Zowe, Ansible, Git, and CI/CD pipelines.

- Mastery of SRE tenets: SLOs/SLIs, error budgets, and Infrastructure-as-Code (IaC).

- Innovation Focus:

- Proven track record in implementing AI/ML-driven monitoring or auto-remediation for mainframe environments.

- Experience modernizing legacy workflows (e.g., replacing CA Endevor with Git-based SDLC).

- Soft Skills:

- Ability to lead cross-functional teams during high-severity incidents.

- Strong communication to align technical execution with business objectives.

- Education:

- Bachelors degree in Computer Science, Engineering, or related field.

---

Preferred Qualifications

- Experience with AI-Driven Automation platforms (e.g. AMELIA AIOps) to standardize and migrate legacy workflows, integrate with event management systems (e.g., BigPanda), and orchestrate ITIL processes (Incident, changes) via ServiceNow

- Certifications: IBM z/OS System Programming, Broadcom Mainframe SRE, or Hashicorp Terraform.

- Familiarity with Zowe Desktop for modern IDE-driven development or Dynatrace APM for CICS/Db2 monitoring.

- Knowledge of mainframe open-source ecosystems (Zowe, Feilong) or hybrid-cloud integrations.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: IT & Information Security
Role Category: IT Infrastructure Services
Role: IT Infrastructure Services - Other
Employement Type: Full time

Contact Details:

Company: Cognizant
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   Site Reliability Engineering Mainframes Zos

 Fraud Alert to job seekers!

₹ Not Disclosed

Cognizant

Company DetailsCognizant Technologies Ltd