Site Reliability Engineer @ Grid Dynamics

Home > Devops

Site Reliability Engineer

Grid Dynamics
5 - 12 years
Hyderabad
3 months ago
Email to a friend
Report this job

Job Description

Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, distributed, fault-tolerant systems enabling online ordering for thousands of restaurants across multiple brands. SRE ensures that Inspire Digital Platform (IDP) services have reliability, uptime appropriate to users needs and a fast rate of improvement. Additionally, SRE s will keep an ever-watchful eye on our systems capacity and performance perform regular capacity planning exercise. Much of our software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation.

Essential functions

Review current workload patterns, understand the business case and prioritize areas of weakness within the platform through log and metric investigation as well as application profiling.

Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection.

Employ deep troubleshooting skills to improve the availability, performance, and security to ensure services are designed with 24/7 availability and operational readiness and rigor.

Perform in depth postmortem on production incidents, to assess effective business impact and for Engineering to learn from these.

Create Dashboards and alerts for Monitoring the IDP platform, define key metrics and service level indicators and ensure relevant metric data is collected to create actionable alerts for SRE and Network Operation Center.

Participate in the 24/7 on call rotation.

Automate toil, by building software and automation for seamless application deployment and third-party tool integration.

Ensure the platform holds a high degree of reliability, at least three 9s.

Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems

own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.

Provide recommendations and feedback in design reviews and review sessions.

Qualifications

AKS, API management, Azure Cache for Redis, Azure Blob Storage, Cosmo DB, Service Bus, Azure Functions, New Relic, Splunk, Prometheus, Grafana., Java, TypeScript, python.

Would be a plus

Requirements:

Bachelor s degree in computer science, a related field, or equivalent practical experience

Minimum 5 years of experience as a Software Engineer, Platform, SRE or Devops engineer supporting large scale SAAS Production B2C or B2B Cloud Platforms.

Development skills, Java, TypeScript, python, OOP expertise is a must.

Hands on Azure Cloud experience particularly with AKS, API management, Azure Cache for Redis, Azure Blob Storage, Cosmo DB, Service Bus, Azure Functions.

Proficiency in monitoring, APM and profiling tools, New Relic, Splunk, Prometheus, Grafana.

Working experience with containers, Kubernetes and Helm.

Functional knowledge of Cloud Network, Firewalls, Ingress and Egress controllers, Service Mesh and

experience with Auth0 Secret management and Cloudflare, CDN, Load Balancer, Cache, Firewall, worker features.

Experience with ArgoCD, GitLab, CICD, Terraform , Infrastructure as Code.

Strong communication skills and ability to explain technical concepts clearly and simply

A willingness to dive into understanding, debugging, and improving any layer of the stack

Responsibilities:

Review current workload patterns, understand the business case and prioritize areas of weakness within the platform through log and metric investigation as well as application profiling.

Work with senior engineering and testing team members to build tools and recommend testing strategies for problem prevention, detection.

Employ deep troubleshooting skills to improve the availability, performance, and security to ensure services are designed with 24/7 availability and operational readiness and rigor.

Perform in depth postmortem on production incidents, to assess effective business impact and for Engineering to learn from these.

Participate in the 24/7 on call rotation.

Automate toil, by building software and automation for seamless application deployment and third-party tool integration.

Ensure the platform holds a high degree of reliability, at least three 9s.

Define non-functional requirements as part of the product lifecycle to influence the new designs, standards, and methods for scalable, highly available distributed systems

own technically intricate issues that cross between DevOps, Databases, Networking, Code, Infrastructure and people; drive them to satisfactory completion.

Provide recommendations and feedback in design reviews and review sessions.

We offer

Opportunity to work on bleeding-edge projects
Work with a highly motivated and dedicated team
Competitive salary
Flexible schedule
Benefits package - medical insurance, sports
Corporate social events
Professional development opportunities
Well-equipped office

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: Grid Dynamics
Location(s): Hyderabad

+ View Contact

Login

Candidates can login here to view contacts and apply.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach Resume Max 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Candidates are expected to provide most recent and accurate profile information, inappropriate content is strictly prohibited!

Keyskills: Automation Debugging splunk Troubleshooting Distribution system Monitoring Python Firewall Capacity planning

Job seems aged, it may have been expired!
Fraud Alert to job seekers!

₹ Not Disclosed

Job application

We will notify the employer with your details. You can also attach a resume or a cover letter.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach ResumeMax 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Similar positions

Observability Engineer

Hexaware Technologies

6 - 11 years

Bengaluru

16 hours ago

₹ Not Disclosed

Custom Software Engineer

Accenture

3 - 8 years

Bengaluru

4 days ago

₹ Not Disclosed

Senior Cloud Engineer

Cognizant

8 - 10 years

Chennai

15 days ago

₹ Not Disclosed

Cloud Platform Engineer

Accenture

15 - 20 years

Pune

21 days ago

₹ Not Disclosed

Grid Dynamics

About Us: Grid Dynamics (NASDAQ: GDYN) is a leading provider of technology consulting, platform and product engineering, and advanced analytics services. Fusing technical vision with business acumen, we enable positive business outcomes for enterprise companies undergoing business tran...

Site Reliability Engineer @ Grid Dynamics

Home > Devops