Responsible, as part of that team, for all stages of design and development and operation for the solution Observability stack, which monitors and alerts the Customer Experience Platform, a complex set of products and platforms, hosted on internal cloud. This includes solution design, analysis, coding, testing, and integration.
Collaborates with multiple project teams, internal and outsourced development partners.
Reviews and evaluates designs and code for compliance with systems design and development guidelines and standards, with emphasis on solution reliability; provides tangible feedback to improve product quality and mitigate failure risk.
Responsible for troubleshooting infrastructure/application issues with our partner vendors
Drives innovation and integration of new technologies into projects and activities in the software systems design organization.
Provides guidance and mentoring to less- experienced staff members.
Should be able to do on call duty (rotation basis among team members).
Requirements:
Bachelors or masters degree in Computer Science, Information Systems, or equivalent knowledge / experience.
Typically, 5-10 years experience.
Domain Experience / Knowledge:
Proven experience with Python desirable
Proven experience with Linux Operating systems & shell scripting
Proven experience developing and maintaining containerized applications working with Docker/Docker Swarm/kubernetes desirable
Proven experience with Monitoring, Alerting and Logging technologies eg Elastic, Prometheus, Grafana
Proven experience debugging complex issues, root cause analysis and supporting large scale application architectures.
Be highly motivated and have the ability to self-learn new technologies and processes quickly.
Proven experience writing production level code in at least one software development language
Proven experience with Go lang is added advantage.
Proven experience with GitHub, Jenkins and/or other CI/CD tools with a strong focus on automation
Proven experience with multiple software systems, applications, design tools
Experience in overall architecture of software systems for products and solutions.
Designing and integrating software systems running on multiple platform types into overall architecture.
Proven ability to identify and implement solutions to technical problems, working independently and leading more junior engineers where appropriate
Knowledge of Agile based development methodologies (SAFE Framework advantageous)
Excellent written and verbal communication skills; mastery in English and local language.
Ability to effectively communicate product architectures, design proposals and negotiate options at management levels.
Personal skills and attributes:
Ability to communicate effectively to management, peers and team members
Ability to work and deliver in global, cross-functional, and virtual teams
Demonstrate a strong combination of analytical skills, intellectual curiosity and reporting acumen
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: Site Reliability EngineerEmployement Type: Full time