We are looking for an experienced cloud development engineer to work on our HPC - CSM Manageability solution.
Role involves designing, implementing, and maintaining our HPC CSM manageability platform hosted on Kubernetes infrastructure. The position requires in-depth expertise in cloud native technologies, particularly Kubernetes, along with a strong background in network domain and hands on with networking and security of Kubernetes .
What youll do:
Design, implement, Kubernetes hosted microservices services to support scalable and resilient cloud-based applications.
Implement infrastructure as code methodologies to automate the provisioning and management of cloud resources.
Core networking skills of OSI / TCP stack
Netwroking of kubernetes covering CNI ,Ingress & Egress, security
Utilize tools such as Terraform or Ansible for declarative infrastructure definition.
Collaborate with cross-functional teams to define and implement best practices for cloud-based services.
Ability to triage- requiring a strong blend of technical depth, investigative skills, and cross-team coordination to quickly assess, prioritize, and resolve complex internal and customer reported issues.
Expertise in container orchestration using Kubernetes, including deploying, scaling, and managing containerized applications.
Develop and maintain automation scripts and tools to streamline deployment, monitoring, and maintenance processes.
Implement CI/CD pipelines to facilitate continuous integration and delivery.
Implement and enforce security best practices within Kubernetes hosted software environments.
Ensure compliance with industry standards and regulations related to cloud infrastructure.
Provide escalated support for complex technical issues
Conduct root cause analysis for incidents and implement preventive measures.
Mentor junior team members and actively participate in knowledge-sharing activities.
What you need to bring:
10+ years of experience
BE / B Tech in CS or equivalent degree
OS : Linux
CNI :Cilium, Weave
Debugging : DNS .CNI Troubleshooting
Programming : Go Lang , Yython
Container Engines: Docker, Podman
Container Orchestration : Kubernetes
Version Control: github, gitlab
Declarative : Ansible , YAML, HCL
Package Manager : Helm, RPM
CI/CD :Jenkins, Github Actions
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: Site Reliability EngineerEmployement Type: Full time