Lead and mentor a team of SRE engineers, fostering a collaborative and high-performance work environment.
Provide technical guidance, set clear objectives, and conduct regular performance evaluations.
Drive the recruitment, onboarding, and professional development of SRE team members.
Site Reliability Engineering Strategy:
Work with the Snr Director, SRE/DevOps to develop and execute a strategic roadmap for SRE, aligning it with the overall business objectives.
Define and implement best practices, standards, and processes to ensure the reliability and performance of Sophos infrastructure.
Collaborate with DevOps and Engineering teams to integrate SRE practices and enhance system performance and availability.
System Reliability and Incident Management:
Establish robust incident management processes, ensuring timely response and resolution of system issues.
Conduct root cause analysis and implement preventive measures to minimize the occurrence of incidents.
Monitor system performance, implement monitoring tools, and conduct capacity planning to support scalability.
Continuous Improvement and Automation:
Collaborate closely with DevOps and Engineering teams to leverage automation tools and frameworks for seamless integration.
Identify areas for improvement in the SRE domain, for NSG cloud services and drive initiatives to enhance operational efficiency and eliminate toil.
Implement automation strategies to streamline processes, reduce manual intervention, and increase reliability.
Collaboration and Communication:
Partner and collaborate closely with Engineering teams to understand business requirements and ensure SRE/DevOps solutions align with organizational goals.
Communicate effectively with stakeholders, including executives and technical teams, providing regular updates on SRE initiatives and performance.
What you will bring
Should have overall 11+ yrs of experience working in DevOps, extensive experience in site reliability engineering , infrastructure management, Operational efficiency or a related technical field.
Proven experience in leading and managing teams, fostering a culture of collaboration and continuous improvement.
Strong knowledge of cloud technologies, in particular AWS.
Proficiency in scripting and automation using languages like Python or Bash.
Solid understanding of containerization technologies (e.g., Docker, Kubernetes) and associated orchestration frameworks (Helm)
Experience with monitoring and logging tools (e.g., ELK stack, Prometheus, Grafana) for system performance analysis.
Familiarity with DevOps practices and tools, such as CI/CD pipelines and configuration management (e.g., Ansible, Terraform).
Excellent problem-solving and analytical skills, with a strong focus on delivering reliable and scalable solutions.
Exceptional communication and interpersonal skills, with the ability to collaborate effectively across teams and influence stakeholders.
Bachelors degree in Computer Science, Information Technology, or a related field. A Masters degree is preferred.
Job Classification
Industry: IT Services & Consulting Functional Area / Department: Engineering - Software & QA, Role Category: Software Development Role: Software Development - Other Employement Type: Full time