Provide support for our Kubernetes platform in all environments, ensuring continuous service and stability of the platform for our users.
Incident response during on-shift hours within documented SLA for production and non-production environments.
Common support needs include:
Troubleshooting deployments
Validating Kubernetes platform health after other teams changes
Bouncing pods
Scaling deployments
Synchronizing passwords
Running other kubctl commands, scripts, or playbooks
Perform regular maintenance tasks as requested such as running scripts, host OS patching, Kubernetes upgrades, etc.
Provide excellent service to our users by responding to their requests correctly within our defined SLA.
Develop automation to improve the work of the team and our users experiences of our platforms.
Skills
Must have
5+ years experience with Linux system administration and Linux command line
3+ years experience with Kubernetes container orchestration software (kubectl, etc)
Experience supporting production services including incident response and troubleshooting
Experience with configuration management software (e.g., Ansible)
Experienced in source code management and implementing of a branching strategy (GitHub)
Knowledge of configuring monitoring solutions and the creation of dashboards (Splunk, Dynatrace, Prometheus, Grafana)
Job Classification
Industry: IT Services & Consulting Functional Area / Department: Engineering - Hardware & Networks, Role Category: IT Network Role: System Administrator / Engineer Employement Type: Full time