Optimize existing processes, identify areas for improvement, and implement automated solutions to enhance efficiency and reliability of Toast systems.
Utilize, configure, and support tools such as JIRA, FireHydrant, and Backstage for tracking events, incidents, and changes, and maintain the Service Catalog
Enable low-risk, compliant releases with rapid rollback capability to maintain platform reliability
Implement automation for risk mitigation strategies to minimize the impact of changes and releases on Toast customers
Work closely with leadership, 3rd party vendors, relevant stakeholders, and to drive work to completion
Do you have the right
ingredients*
(Requirements)
Industry experience with at least 2 years engineering experience with a focus on SRE
Bachelor s Degree in Computer Science, engineering, or related field
Working knowledge of complex cloud environments (AWS, GCP, Azure, etc.)
Experience scripting automation (Python, Go, etc)
Experience with Infrastructure as code (Terraform, etc)
Experience participating in Incident Response
Strong written and verbal communication skills
Strong problem-solving skills and the ability to think strategically and analytically
Experience working with a diverse global team across multiple regions and time zones
Working knowledge of various best practice frameworks, including ITIL, ITSM, Agile/scrum, change management, etc a plus
Experience with Incident and Change processes and tools (JIRA, OpsGenie, FireHydrant, DX, etc) a plus
Job Classification
Industry: Software ProductFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: Site Reliability EngineerEmployement Type: Full time