Overview
The Distributed Platform Operations team is looking for a Site Reliability Engineer who can help us solve problems, implement automation, and leverage best practices.
Are you a born problem solver who loves to figure out how something works?
Are you a detail -oriented individual who enjoys complex problem solving?
Do you love determining the correct actions required to fix a problem?
Do you have a low tolerance for manual work and look to automate everything you can?
The Site Reliability Engineer (SRE) will be responsible for ensuring the reliability, scalability, and performance of IT infrastructure supporting VMware virtualization and Oracle Linux environments. This role combines operational excellence with automation and engineering practices to reduce toil, improve system resilience, and deliver a seamless experience for internal and external customers.Key Responsibilities
Infrastructure Reliability & Performance
Monitor, maintain, and optimize VMware clusters, ESXi hosts, and Oracle Linux servers
Ensure high availability and disaster recovery readiness for virtualized environments
Troubleshoot and resolve incidents impacting virtualization and Linux platformsAutomation & Tooling
Design and implement automation for patching, configuration management, and routine operational tasks using tools like Chef, Ansible, Jenkins, and Python
Develop scripts and pipelines to reduce manual effort and improve operational agilityCapacity & Configuration Management
Manage resource allocation across VMware clusters and Oracle Linux systems
Implement standardization and compliance for OS configurations and security baselinesMonitoring & Alerting
Configure and maintain monitoring solutions (e.g., vROps, Splunk, Prometheus) for proactive issue detection
Optimize alerting thresholds to reduce noise and improve incident response timesIncident & Problem Management
Lead root cause analysis for critical incidents and implement permanent fixes
Collaborate with cross-functional teams to resolve complex infrastructure issuesSecurity & Compliance
Ensure timely patching of VMware and Oracle Linux environments to address vulnerabilities
Maintain compliance with enterprise security standards and regulatory requirementsAll About You and Required Skills & Qualifications

Keyskills: site reliability kubernetes python oracle sre vmware chef site reliability engineering docker ansible alerting incident response configuration management compliance linux jenkins splunk bash prometheus aws azure
Who is Mastercard?Mastercard is a global technology company in the payments industry. Our mission is to connect and power an inclusive, digital economy that benefits everyone, everywhere by making transactions safe, simple, smart, and accessible. Using secure data and networks,...