Senior Site Reliability Engineer

BetterUp Chicago, IL $164,000 - $205,000
Full Time Senior Level 4+ years

Posted 3 weeks ago

Interested in this position?

Upload your resume and we'll match you with this and other relevant opportunities.

Upload Your Resume

About This Role

This role involves building and operating cloud infrastructure, managing Kubernetes clusters, and designing intelligent observability systems. The engineer will leverage AI-powered tools to transform monitoring and troubleshooting of production systems, ensuring high availability and performance.

Responsibilities

  • Leverage AI-powered tools and automation to transform how we monitor, troubleshoot, and maintain production systems
  • Build and operate cloud infrastructure on AWS, using Terraform to codify and version-control our entire environment
  • Manage and scale Kubernetes clusters that power BetterUp's platform, ensuring high availability and performance
  • Design intelligent alerting and observability systems
  • Collaborate with engineering teams to embed reliability into the development lifecycle, shifting left on operational concerns
  • Automate incident response workflows and build self-healing infrastructure
  • Experiment with and adopt emerging AI tools for log analysis, anomaly detection, and predictive maintenance
  • Drive continuous improvement through data-driven retrospectives and reliability metrics

Requirements

  • 4+ years of experience in SRE or infrastructure roles
  • Genuine excitement about AI tooling: using copilots, AI assistants, or LLM-based tools
  • Deep experience with AWS
  • Hands-on Kubernetes experience: deploying, scaling, debugging, and securing clusters
  • Strong Terraform skills with experience managing complex, multi-environment infrastructure
  • Familiarity with modern observability stacks (Datadog, Prometheus, OpenTelemetry)
  • Strong debugging instincts and comfort navigating distributed systems
  • Clear communication skills
  • A builder's mindset: see manual processes as opportunities for automation

Qualifications

  • 4+ years of experience in SRE or infrastructure roles

Skills

AWS * Kubernetes * Terraform * Prometheus * OpenTelemetry * Datadog * AI tooling *

* Required skills

Benefits

Dental Insurance
Flexible paid time off
BetterUp coaching
Company wide Summer & Winter breaks
Vision Insurance
Volunteer Days
401(k) self-contribution
BetterUp Inner Workdays
Medical Insurance
Learning and Development stipend
Federal/statutory holidays
Charitable contribution

About BetterUp

BetterUp is a company focused on human transformation, providing coaching and development, with a hybrid work model.

Professional Services
View all jobs at BetterUp →