Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Lead Site Reliability Engineer (SRE) @ Saviynt

Home > Devops

 Lead Site Reliability Engineer (SRE)

Job Description

The team comes from diverse technical backgrounds, and the responsibilities provide the opportunity for a variety of challenges. Ideal candidates will have a background in either software engineering or systems engineering with a desire to learn the other or previous experience with building and managing Monitoring and Alerting systems. We are looking for a Systems Thinking, Principal Engineer who has helped teams scale through production insights, operational automation, building observability program, developer guidance, real-time metrics, automation, automation, automation!
 
WHAT YOU WILL BE DOING
    • Implement monitoring and alerting systems to guarantee high availability and performance, with a dedicated focus on SLA and availability metrics.
    • Collaborate with engineering and operations teams to identify critical components and systems requiring enhanced availability measures.
    • Design and implement strategies, tooling, and processes to enhance system uptime and reliability.
    • Continuously evaluate and recommend improvements to platform infrastructure and processes, enhancing efficiency and reliability.
    • Align the platform with customer needs and business goals by working closely with cross-functional teams.
    • Run the production environment by monitoring availability and taking a holistic view of system health.
    • Build software and systems to monitor platform infrastructure and applications.
    • Monitor and Improve reliability, quality, and time-to-market of our suite of software solutions.
    • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.
    • Provide primary operational support and engineering for multiple large-scale distributed software applications.
    • Gather and analyze metrics from operating systems as we'll as applications to assist in performance tuning and fault finding.
WHAT YOU BRING
    • bachelors degree or higher in a technology related field (eg Engineering, Computer Science, etc) required, masters degree a plus
    • 6+ years professional experience Monitoring and Alerting roles on major cloud platforms (AWS, Azure), preferably someone with project leadership roles.
    • 4+ experience in Cloud development (AWS, Azure) and observability skills; Experience with building and operating highly resilient platforms in AWS cloud environments.
    • 3+ years of experience in software development with Python, NodeJS, or Java with a focus on SDLC and automation
    • Hands-on experience with container orchestration, preferably with Kubernetes
    • Hands-on experience with building observability, monitoring and alerting on large scale distributed systems.
    • Leadership/design of application and/or infrastructure migration projects from on-prem to cloud
    • Cloud architecture design and implementation to solve key business needs and meet team goals.
    • Familiarity with current AWS solutions; Azure experience also considered.
    • Containerized workloads (Prefer Helm; Related: AKS & EKS, other K8s distributions, Docker, JFrog)
    • Logging and monitoring tools (Prefer: Prometheus, Grafana, Dataddon, AWS Cloudwatch; Related, , Azure Monitor, Log Analytics, Fluentd)
    • Network Security (eg AWZ Policy, Azure Policy, VPN, Active Directory/RBAC, ACLs, NSG rules, private endpoints)
    • Proven experience in implementing advanced observability practices and techniques at scale.
    • Hands on experience with one or more observability tools (Prometheus, Grafana,
    • ELK/OpenSearch, OpenTelemetry, Datadog, etc)
    • Experienced in Instrumentation with systems skills on building and operating,
    • monitoring, logging, alerting services of distributed systems at scale.
    • Demonstrated ability to utilize modern monitoring tools (DataDog, Prometheus, etc)
    • Experienced in Instrumentation with systems skills on building and operating,
    • monitoring, logging, alerting services of distributed systems at scale.
    • Ability to build monitoring ecosystem with high fidelity alerting.
    • Ability to automate resolution of alerts.
    • Ability to automate with various scripting languages (Python, Golang, Shell scripting,etc)
    • Knowledge of managing systems using infrastructure as code tools (IAM, ARM,Terraform, Chef)
    • Solid understanding of Cloud Computing and DevOps concepts.
    • Hands-on Kubernetes skills and knowledge.
    • Proven experience in maintaining scalability and resiliency of complex environment.
    • Ability to triage, execute root cause analysis, and be decisive under pressure
    • Experience managing and interpreting large datasets using query languages and visualization tools
    • Proficient communication skills with an ability to reach both technical and non-technical audience
    • Ability to learn new software, method and practices and bringing them to our developers
    • Ability to work with a variety of individuals and groups, both in person and virtually, in a
    • constructive and collaborative manner and build and maintain effective relationships

Job Classification

Industry: Software Product
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Saviynt
Location(s): Bengaluru

+ View Contactajax loader


Keyskills:   Performance tuning Cloud computing Automation VPN Shell scripting Active directory Network security SDLC Analytics Python

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

AWS DevOps Engineer

  • Cognizant
  • 12 - 16 years
  • Bengaluru
  • 16 hours ago
₹ Not Disclosed

DevOps Engineer L4

  • Wipro HR Soniya
  • 5 - 8 years
  • Pune
  • 3 days ago
₹ Not Disclosed

Engineer For Sovereign Cloud Delivery (btp / Sac)

  • SAP Servers Tech
  • 4 - 9 years
  • Bengaluru
  • 3 days ago
₹ Not Disclosed

Azure Engineer

  • Cognizant
  • 5 - 8 years
  • Hyderabad
  • 9 days ago
₹ Not Disclosed

Saviynt

Saviynt is an identity authority platform built to power and protect the world at work. In a world of digital transformation, where organizations are faced with increasing cyber risk but cannot afford defensive measures to slow down progress, Saviynts Enterprise Identity Cloud gives customers unp...