Senior Site Reliability Engineer
Full Time
Senior Level
4+ years
Posted 3 weeks ago
Interested in this position?
Upload your resume and we'll match you with this and other relevant opportunities.
Upload Your ResumeAbout This Role
This role involves building and operating cloud infrastructure, managing Kubernetes clusters, and designing intelligent observability systems. The engineer will leverage AI-powered tools to transform monitoring and troubleshooting of production systems, ensuring high availability and performance.
Responsibilities
- Leverage AI-powered tools and automation to transform how we monitor, troubleshoot, and maintain production systems
- Build and operate cloud infrastructure on AWS, using Terraform to codify and version-control our entire environment
- Manage and scale Kubernetes clusters that power BetterUp's platform, ensuring high availability and performance
- Design intelligent alerting and observability systems
- Collaborate with engineering teams to embed reliability into the development lifecycle, shifting left on operational concerns
- Automate incident response workflows and build self-healing infrastructure
- Experiment with and adopt emerging AI tools for log analysis, anomaly detection, and predictive maintenance
- Drive continuous improvement through data-driven retrospectives and reliability metrics
Requirements
- 4+ years of experience in SRE or infrastructure roles
- Genuine excitement about AI tooling: using copilots, AI assistants, or LLM-based tools
- Deep experience with AWS
- Hands-on Kubernetes experience: deploying, scaling, debugging, and securing clusters
- Strong Terraform skills with experience managing complex, multi-environment infrastructure
- Familiarity with modern observability stacks (Datadog, Prometheus, OpenTelemetry)
- Strong debugging instincts and comfort navigating distributed systems
- Clear communication skills
- A builder's mindset: see manual processes as opportunities for automation
Qualifications
- 4+ years of experience in SRE or infrastructure roles
Skills
AWS
*
Kubernetes
*
Terraform
*
Prometheus
*
OpenTelemetry
*
Datadog
*
AI tooling
*
* Required skills
Benefits
Dental Insurance
Flexible paid time off
BetterUp coaching
Company wide Summer & Winter breaks
Vision Insurance
Volunteer Days
401(k) self-contribution
BetterUp Inner Workdays
Medical Insurance
Learning and Development stipend
Federal/statutory holidays
Charitable contribution
About BetterUp
BetterUp is a company focused on human transformation, providing coaching and development, with a hybrid work model.
Professional Services
View all jobs at BetterUp →
Related Searches
Similar Jobs
Site Reliability Engineer - Trading
Active
Hunter Bond
·
New York, NY
·
$100,000 - $200,000
Python
Kubernetes
Docker
CI/CD
+9 more
2 weeks ago
Site Reliability Engineer III
Active
JPMorganChase
·
Plano, TX
AWS
Splunk
Kubernetes
AI
+11 more
2 weeks ago
Site Reliability Engineer
Active
FUSTIS LLC
·
Irvine, CA
·
$60 - $65
Python
AWS
Azure
Kubernetes
+16 more
2 weeks ago
Site Reliability Engineer
Active
AppBuddy
·
Boston, MA
·
$190,000 - $215,000
Python
AWS
Jenkins
Kubernetes
+21 more
3 weeks ago
Site Reliability Engineer
Active
Origami Risk
·
Atlanta, GA
·
$100,000 - $120,000
SQL
AWS
Azure
C++
+9 more
4 weeks ago