Job Description
Python Developer - Service Reliability Engineers
Programming and Scripting for Data Applications
o SQL: Advanced SQL skills for writing complex queries, stored procedures, and optimizing database performance.
o Python: Experience in developing data pipelines, integrations, and analytics scripts using Python.
o R: Knowledge of statistical analysis and data modeling using R for advanced analytics.
o JavaScript: Familiarity with JavaScript for embedding analytics into web applications.
Qualifications
Service Reliability: Experience with managing and maintaining highly-available systems, including cloud-based infrastructure.
Programming: Proficiency in programming to automate repetitive tasks (toil) to reduce manual effort and human error.
Monitoring & Observability: Solid understanding of monitoring tools, incident management platforms, and metrics analysis.
Technical Depth: Deep knowledge of system performance optimization and troubleshooting methodologies. Experience with cloud platforms, databases, CI/CD, distributed systems, and security best practices.
Communication & Collaboration: Strong communication skills (written and verbal) to effectively collaborate across cross-functional teams.
Problem Solving: Ability to thrive in high-pressure situations and demonstrate a calm, methodical approach to problem-solving. Analytical mindset for interpreting data, metrics, and patterns to make informed decisions and predict future issues.
Systemic Thinking: Ability to view interconnected systems holistically anticipating the broader impact of changes and designing for resilience.
Ownership and Proactiveness: Take responsibility for the reliability and performance of services. Proactively identifying potential problems, performance bottlenecks, and areas for improvement before they impact users.
Key Responsibilities
Infrastructure and Operations: Ensure the reliability and scalability of critical systems by designing and managing robust infrastructure solutions.
System Monitoring: Proactively monitor system health, using performance metrics and automated tools to detect potential issues before they impact users.
Incident Management: Lead response efforts during service disruptions, ensuring swift resolution and minimal downtime.
Problem Solving: Analyze root causes of system failures and implement long-term fixes to enhance system reliability.
Automation: Develop scripts and tools to automate repetitive tasks, improving operational efficiency and reducing manual interventions.
Collaboration: Partner with development teams to align on reliability goals and implement best practices into software design and deployment.
Documentation: Maintain comprehensive system documentation to support consistent and efficient troubleshooting and knowledge sharing.
Continuous Improvement: Drive innovation by identifying areas for enhancement and applying cutting-edge technologies and operational practices.
Education & Experience
Education: Bachelor s or Master s Degree in Information systems, Computer Science / Computer Engineering or equivalent.
Experience: 6-10yrs of experience
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time
Contact Details:
Company: Zensar
Location(s): Kolkata
Keyskills:
Computer science
Automation
Software design
Data modeling
Javascript
Incident management
Stored procedures
Troubleshooting
SQL
Python