Site Reliability Architect - Configuration Management @ Nomiso

Home > Devops

Site Reliability Architect - Configuration Management

Nomiso
15 - 17 years
Singapore
1 month ago
Email to a friend
Report this job

Job Description

Experience Level : Senior LevelSRE Architect

Position Overview :

We are looking for a SRE Architect who will work with technology experts to design optimal solutions to

requirements for our customers.

This is achieved through interactive requirements gathering, determination of best fit solutions based on problem solving approaches, integrated solution design based on multiple technology types, and a strong ability to present and articulate solutions to senior members of the customer teams.

Roles and Responsibilities : - Own the Infrastructure, APM and work with Developers and Systems engineers to Build, Release, Monitor and run the services reliability exceeding the agreed SLAs.- Write software to automate API-driven tasks at scale and contribute to the product codebase in Java, JS, React, Node, Go and Python- Write automation to reduce toil and eliminate manual tasks that are repeatable.- Work with Ansible, Puppet, Chef, Terraform or another config management / orchestration suite, know where it's broken, work towards fixing them and explore new alternatives- Define and accelerate implementation of support processes, tools and best practices

- Maintain services once they are live by measuring and monitoring availability, latency and overall system reliability- Handle cross team performance issues from identification of the cause, determining the areas of improvement and driving those actions to closure- Performance and maturity baselining of Systems, tools maturity & coverage, metrics, technology and engineering practices- Define, Measure and improve Reliability Metrics (SLO/SLI), Observability (Monitoring, Logging-Tracing solutions), Ops process (Incident, Problem Mgmt) and streamline automate release management.

- Build dashboards to provide visibility into performance of the applications.- Create chaos in the production environment purposefully in a controlled manager to validate reliability of systems.- Mentor and coach other SREs in the organization- Provide written and verbal updates to executives and the stakeholders of the application in the organization.- Understand the current process, system setup and propose the improvements needed in the processes, and technology so that the application exceeds the desired Service Level Objective.- Strong believer of automation to bring in sustained continuous improvement by automating Toil, Runbooks, improving ability of the applications to auto heal leading to improved reliabilityMust Have Skills : The successful candidate will have the following attributes/qualifications : - 15+ years of experience in Development and Operations of applications/services in production that has uptime over 99.9%.- 8+ years of experience as a SRE in handling applications that are web scale- Strong hands-on coding experience in one or more programming languages such as Python, Golang, Java, Bash, etc.- Good understanding of Observability (monitoring, logging, tracing, metrics), Chaos engineering concepts.- Proficiency in using Observability tools (example : New Relic, Datadog, etc) for monitoring, logging, tracing.- Expert level hands on knowledge in public cloud platform AWS and/or Google Cloud Platform. Professional level certificate on one of the public clouds is highly desirable.- Must have hands-on experience in using configuration management systems such as Ansible or SaltStack and infrastructure automation tools like Terraform or CloudFormation.- Should have used altering systems such as Pager Duty.- Should have implemented solutions around Service Level Indicators (SLIs) and Service Level Objectives (SLOs) for services.- Measurement should have been within a system and across systems in distributed systems- Should have supported Production Incidents (PIs) on critical applications of a company. Troubleshoot, debug, and diagnose operational issues and drive them to closure.- Understanding of software delivery life cycles, particularly Agile/Lean & DevOps - Proven experience in handling large scale and growing infrastructure across Data Centers and heterogeneous Cloud platforms- Experience as a service owner in managing large geographically diverse stakeholders

- Ability to work with creative fast growing engineering team and motivate them to deliver their best work- History of driving innovation.Good to Have Skills :

- Familiarity with handling : o Containerization Kubernetes, Docker, Rancher, etco Kafka, Yarn, ElasticSearch etc.o Source code management and Implementation of Security best practices.o Tech Stack - Python, Falcon, Elastic Search, MongoDB, AWS (SQS S3), Map Reduce.- Networking knowledge- Understanding of software delivery life cycles, particularly Agile/Lean & DevOps- Contribution to open source communityQualification : Masters or Bachelors degree in Computer Science Engineering, or a related technical degree.Location : Singapore

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: DevOps Consultant / Architect
Employement Type: Full time

Contact Details:

Company: Nomiso
Location(s): Singapore

+ View Contact

Login

Candidates can login here to view contacts and apply.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

500 characters remaining

Attach Resume Max 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Candidates are expected to provide most recent and accurate profile information, inappropriate content is strictly prohibited!

Keyskills: Configuration Management Log Management Tools Site Reliability Docker Ansible Observability Services SLA Puppet Monitoring Tools IT Automation Kubernetes

Fraud Alert to job seekers!

₹ Not Disclosed

Job application

We will notify the employer with your details. You can also attach a resume or a cover letter.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach ResumeMax 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Similar positions

Appdynamics Specialist (Contractual - 5 Years)

RT Network Solutions

7 - 12 years

Noida, Gurugram

6 days ago

₹ Not Disclosed

Engineer / Sr. Engineer - Dev Ops

World Fashion Exchange

2 - 4 years

Noida, Gurugram

7 days ago

₹ Not Disclosed

Lead Engineer - OCI Devops - Bangalore / Pune / Chennai - 4+

MNC Client of GSN!

4 - 9 years

Pune

8 days ago

₹ 15-25 Lacs P.A.

Cloud & Devops - Senior Engineer

Iris Software

4 - 7 years

Noida, Gurugram

8 days ago

₹ Not Disclosed

Nomiso

NOMISO INDIA

Site Reliability Architect - Configuration Management @ Nomiso

Home > Devops