POSITION SUMMARY :
In this role, you will play a crucial part in shaping the firm's infrastructure reliability and efficiency by implementing robust Site Reliability Engineering practices. Your contribution will be pivotal in ensuring the availability, scalability, and performance of our systems and applications. Leveraging your strong technical skills and expertise in DevOps principles, you will work towards enhancing the reliability of our infrastructure and minimizing downtime, thus enabling the organization to deliver high-quality software with maximum efficiency
EXPERIENCE AND REQUIRED SKILL SETS :
- Ensure 24-7 uptime and stability of production systems
- Investigate and troubleshoot production issues
- Collaborate with developers to optimize system performance
- Participate in on-call rotation to provide 24/7 support for critical systems
- Work on automation and enhancements to reduce manual processes / intervention.
- Relevant 5+ years of experience in SRE / Production/Product Support role, with a track record of implementing SRE practices
- Basic understanding of cloud solutions provided by providers such as AWS or Azure.
- Basic-Intermediate knowledge of Scripting in either of Bash/Python/PowerShell.
- Good presentation, communication and interpersonal skills with the ability to collaborate effectively with cross-functional teams and stakeholders across different countries and cultures.
- Good problem solving and troubleshooting skills
- Continuous learning mindset and willingness to adapt to new technologies and industry trends.
- Good Understanding of Operating System Commands (Linux), SQL (Ability to write, analyze queries and deduce / build important information per requirement)
- In-depth knowledge of Trading Life Cycle:
The candidate should possess a comprehensive understanding of trading life cycle, including order management, trade execution, settlement and post-trade processes. Familiarity with various financial products like Equities, Derivatives, Currencies, Commodities, FX is a plus.
- Incident and Problem Management Expertise:
The candidate must demonstrate strong problem-solving skills and the ability to manage incidents frequently and efficiently within a fast paced trading environment. This includes identifying, analyzing and resolving issues related to trading systems and processes as well as collaborating with cross-functional teams to implement long-term solutions and improve operational efficiency.
- Good Understanding of Tools :
(a) Orchestration Autosys / Airflow or Cron
(b) Monitoring & Logging PagerDuty, Prometheus & Grafana or Datadog, Splunk
(c) Project Management / ITSM Service Now (Basic ability to navigate / create change tickets / incidents) , Jira (Basic ability to create Jira Tickets , ability to filter your work)
EDUCATION :
- Bachelors degree or masters in computer science, Engineering, Software Engineering or a relevant field
Keyskills: DevOps Log Management Tools Azure Site Reliability Bash Scripting System Reliability Observability Services Performance Tuning System Scalability AWS Monitoring Tools
Gemini Solutions is a global IT firm, a leading offshore outsourcing company with a specialized focus in financial services. Gemini offers several management services and is able to combine our range of services to suit a diverse range of needs. Having well equipped automated corporate office in Gur...