Site Reliability Engineer
Posted 2 months ago Expired
This job has expired
Looking for a job like Site Reliability Engineer in or near Atlanta, GA? Upload your resume and we'll notify you when similar positions become available.
Upload Your ResumeAbout This Role
This Site Reliability Engineer will improve time to resolution, advance overall site reliability and scalability, and lead post-incident investigations to identify root causes and preventive measures.
Responsibilities
- Lead post-incident investigations for the Site Reliability team
- Conduct in-depth post-incident analyses to identify root causes and develop preventive strategies
- Draft clear and insightful RCAs for customer delivery
- Cross-train colleagues on how to best leverage observability tools during incident and performance investigations
- Provide visibility to all stakeholders throughout the entire Site Reliability process
- Collaborate with cross-functional teams to implement system enhancements that enhance scalability and stability
- Develop client-focused dashboards/alerts to proactively identify performance challenges
- Monitor and continuously improve our time to resolution metrics
- Maintain and configure core observability tools to ensure optimum performance and key metrics/data are available
- Contribute to the development of automation tools to streamline incident response
Requirements
- 5+ years of proven experience in a Site Reliability Engineering role
- Strong knowledge of SRE best practices and incident management protocols
- Deep experience using and/or configuring New Relic, Data Dog, SumoLogic or similar observability tools
- Proficiency in reading and writing code (e.g., JavaScript, .NET, SQL)
- Familiarity with cloud platforms (e.g., AWS, Azure) and architectural patterns
- Excellent problem-solving skills and a data-driven approach to incident analysis
- Prior experience operating within a Public Cloud environment (AWS strongly preferred)
- Experience troubleshooting C#/.Net based web applications to identify bugs/performance challenges
- Solid knowledge of SaaS operations
- Advanced written and verbal communication skills
Qualifications
- Bachelor's degree in Computer Science or related field (or equivalent experience)
- 5+ years of proven experience in a Site Reliability Engineering role
Nice to Have
- Windows and SQL-server troubleshooting skills
- Knowledge of Continuous Integration and Continuous Delivery (CI/CD) pipelines
- Experience working in an Infrastructure as a Code (IaC) environment
- Previous experience as a Software Engineer and/or System Administrator
Skills
* Required skills
Benefits
About Origami Risk
Origami Risk provides integrated SaaS solutions to organizations across the risk and insurance ecosystem, delivering risk management and insurance core system solutions from a cloud-based platform.