Develop and maintain CI/CD pipelines using GitHub Actions to streamline the software development lifecycle. Design, deploy, and manage AWS infrastructure, ensuring high availability and security. Implement and manage Helm Charts for Kubernetes to automate the deployment of applications. Utilize YAML configuration files for defining and managing infrastructure and application settings. Apply SRE principles to enhance system reliability, performance, and capacity through automation and monitoring. Collaborate with development teams to integrate reliability and scalability into the software development process. Monitor application and infrastructure performance, troubleshoot issues, and implement solutions to improve system reliability. Implement infrastructure as code (IaC) using tools like Terraform for efficient resource management.Required Skills and Qualifications Proven experience in Site Reliability Engineering (SRE) practices. Strong expertise in GitHub Actions and Terraform for CI/CD pipeline development. Strong knowledge of YAML, its code structures, parameterization for configuration management. Working experience with AWS services, including EC2, S3, Lambda, RDS, and VPC. Deeper understanding of authentication, security, scalability, parallelization of GitHub Actions/Jobs across the CICD process. Working experience in Helm Charts for Kubernetes deployment and management. Proficiency in scripting and automation using languages such as Python or PowerShell. Understanding of containerization technologies like Docker and orchestration with Kubernetes. Excellent problem solving skills and ability to work collaboratively in a fast paced environment. Strong communication and collaboration skills.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: Site Reliability EngineerEmployement Type: Full time