Participate in on-call SRE support of a Production Kubernetes environment.
Develop integrations of Kubernetes platform with existing enterprise tools via REST API
Work with vendors on high priority issues
Rotational 12x7 on call support with offshore evening coverage
Define/Build/Support technical infrastructure environment for Kubernetes platform
Define and document standards and guidelines
Develop and automate repeatable tasks
Consult with development users; determine requirements and recommend solutions
Participate in product evaluations, design review session, data requirement meetings and consulting with application development products
Contribute to open-source projects that we use to drive continuous improvements
Skills
Must have
3+ years of software development experience such as Python and Golang is a MUST
3+ years of experience with Kubernetes
2+ years of experience with AWS, Azure, or Google Cloud Platform
3+ years of experience working with application development teams
1+ year of experience with GitHub
Willingness to participate in on-call support rotation
Terraform or Ansible
Helm
CI/CD tools such as Jenkins
REST API development
Grafana, Prometheus, Thanos
Nice to have
Anthos on Bare Metal
GKE
GCP
OpenShift
Individual contributor in Scrum or Agile team
Splunk
Job Classification
Industry: IT Services & Consulting Functional Area / Department: Engineering - Software & QA, Role Category: DevOps Role: Site Reliability Engineer Employement Type: Full time