Job Description
Creating and sustaining infrastructure and tools to ensure reliable services and enhance customer experience
Collaborating with teams to enhance observability, automation, deployment, and system reliability
Developing, deploying, and managing scalable, dependable infrastructure solutions to power Zscalers global cloud services
Collaborating with product, operations, and security teams to smoothly implement features, tools, and updates across the platform
Developing and deploying AI-powered tools to boost operational efficiency and advance engineering excellence
What Were Looking For (Minimum Qualifications)
Drive comprehensive observability for microservices and Kubernetes clusters using tools like OpenTelemetry
Build and manage automation tools to streamline deployment, patching, scaling, and infrastructure management
Build scalable portals for SRE dashboards, SLI/SLO/SLA tracking, error budgets, and executive metrics to enable data-driven decision-making
Proficient in programming and scripting with Java, Python, Go, Shell, or similar languages
Skilled in OpenStack cloud, Linux, Kafka, RabbitMQ, Prometheus, Terraform, Kubernetes, Ansible, MLOps, Generative AI, PostgreSQL, and analytics databases
Familiarity with current AWS solutions; Azure experience also considered
Containerized workloads (Prefer Helm; Related: AKS & EKS, other K8s distributions, Docker, JFrog)
Logging and monitoring tools (Prefer: Prometheus, Grafana, Dataddon, AWS Cloudwatch; Related, , Azure Monitor, Log Analytics, Fluentd)
Network Security (e.g. AWZ Policy, Azure Policy, VPN, Active Directory/RBAC, ACLs, NSG rules, private endpoints)
Proven experience in implementing advanced observability practices and techniques at scale
Hands on experience with one or more observability tools (Prometheus, Grafana,
ELK/OpenSearch, OpenTelemetry, Datadog, etc.)
What Will Make You Stand Out (Preferred Qualifications)
Bachelor s in Computer Science or related field, or equivalent experience, with 4+ years in Cloud-SRE, DevOps, or Systems Engineering
Strong problem-solving capabilities, excellent collaboration and communication skills, and a proactive approach to teamwork
Knowledge of testing tools and frameworks
Job Classification
Industry: Software Product
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time
Contact Details:
Company: Saviynt
Location(s): Bengaluru
Keyskills:
Computer science
Linux
Testing tools
VPN
Infrastructure management
Postgresql
Active directory
Network security
Analytics
Python