Job Description
Responsibilities: - Monitor alerts, metrics, and logs to detect incidents, and events and correlate them to find the root cause of outages. - Conduct Post-Incident Review with various roles including developers, infrastructure engineers, product owners, system owners, and information security to identify the cause and solution through automation to improve the agility, and performance of the system. - Work with other SREs to drive standards and consistency around best practices - Create, and modify runbooks and knowledge base which can be used by other engineers to follow and resolve incidents quickly. Identify opportunities and implement the automation needed to address and prevent operational issues. - Ability to understand and modify existing code, and scripts used for automation to build applications and infrastructure. Identify and enable new alerts and monitors for critical services impacting system reliability. - Drive increased efficiency across the teams, eliminating. duplication, leveraging common DevOps processes, tools, and technology - Collaborate with team in defining architecture; identify potential risks to successful implementation - Work closely with business partners and software development teams in a matrix organization structure - Automate tasks to reduce manual work, reduce outages, and enhance customer and employee experience - Communicate and resolve complex production issues and implement preventative measures Implement and tune monitoring, metric collection, and alerting - Identify opportunities and implement the automation needed to address and prevent operational issues Required Skills: - Solid hands-on experience in setting up and correlating SIEM Monitoring Tools including but limited to Azure Sentinel, Azure Log Analytics, Azure Monitor, Application Insights, Splunk, Moogsoft, CA APM/Wily Introscope, etc. (OR) - Senior Software developer in developing applications using tools such as Java, Spring Boot, Spring Framework, .NET Core, Angular, React, Vue.js - Hands-on experience with a variety of database technologies including relational databases such as Azure SQL, SQL Server, MySQL, or NoSQL databases such as Azure Cosmos DB, MongoDB, Postgres SQL, etc. - Hand-on experience integrating systems with REST APIs, Databases(RDBMS), LDAP, Active Directory, Azure Active Directory, RabbitMQ, Redis Cache, Azure Functions (Serverless) - Hands-on experience in deploying applications to Production through automated CI/CD pipelines or automated scripts using tools such as Maven, Gradle, Docker, Git, JUnit, MSTest, Tomcat, SonarQube, Fortify, Selenium, Cucumber, Contrast Security, etc - Understanding and experience delivering Twelve-Factor cloud-native applications - Understanding and experience with Microservices architecture - Knowledge, understanding, and experience using ticketing systems for Catalogs and Change Management like ServiceNow, HP ITSM, and BMC Remedy. - Excellent communication and coordination skills to interact with different stakeholders who are technical and non-technical. Preferred Skills: ------------------ - Knowledge, understanding and experience of DevOps, Agile Methodologies - Experience in Microsoft Azure Technologies - Experience in Tanzu Application/Container Services (TAS/TKS) (Previously Pivotal Cloud Foundry) or equivalent container based platforms/products like Openshift, Azure Kubernetes Services, Google Container Services etc. - Experience using ServiceNow ITOM and ITSM to create catalogs or to automate processes by integrating with other systems. - We highly encourage SREs, DevOps, Application Developers, System developers, System Engineers who have knowledge and understanding of how software is built and managed
Employement Category:
Employement Type: Full time
Industry: Others
Role Category: Application Programming / Maintenance
Functional Area: Not Applicable
Role/Responsibilies: Cloud Site Reliability Engineer(Azure)
Keyskills:
Java
Spring Boot
Spring Framework
Angular
SQL Server
MySQL
MongoDB
Databases
LDAP
Active Directory
RabbitMQ
Maven
Gradle
Docker
Git
JUnit
MSTest
Tomcat
SonarQube
Fortify
Selenium
Cucumber
ServiceNow
BMC Remedy
DevOps
Agile Methodologies
Openshift
SIEM Monitoring Tools
NET Core
React
Vuejs
Azure SQL
Azure Cosmos DB
Postgres SQL
REST APIs
Azure Active Directory
Redis Cache
Azure Functions
Contrast Security
TwelveFactor cloudnative applications
Microservices architecture
HP ITSM
Microsoft Azure Technologies
Tanzu ApplicationContainer Services
Azure Kubernetes Services
Google Container Services
ServiceNow ITOM
ServiceNow ITSM