About the Job:
The IT AI Application Platform team is seeking a Principal Senior Site Reliability Engineer (SRE) to design, develop, scale, and operate our AI Application Platform based on Red Hat technologies, including OpenShift AI (RHOAI) and Red Hat Enterprise Linux AI (RHEL AI). As a Principal SRE you will contribute to running core AI services at scale by enabling customer self-service, making our monitoring system more sustainable, and eliminating toil through automation.
On the IT AI Application Platform team, you will have the opportunity to lead and influence the complex challenges of scale which are unique to Red Hat IT managed AI platform services, while using your skills in coding, operations, and large-scale distributed system design. We develop, deploy, and maintain Red Hats next-generation AI application deployment environment for custom applications and services across a range of hybrid cloud infrastructures. We are a global team operating on-premise and in the public cloud, using the latest technologies from Red Hat and beyond.
What will you do?
What will you bring?

Keyskills: kubernetes java protocols tcp microsoft azure c++ production golang redhat linux ansible gcp linux paas python ip application engineering dns google configuration management puppet saas troubleshooting http openstack agile aws unix
Founded in 1993, Red Hat is the premier Linux and open source provider. The most recognized Linux brand in the world. We serve global enterprises through technology and services made possible by the open source model. Solutions include Red Hat Enterprise Linux operating platforms, sold through a sub...