Site Reliability Engineer
Contract
Mid Level
4+ years
Posted 2 weeks ago
Interested in this position?
Upload your resume and we'll match you with this and other relevant opportunities.
Upload Your ResumeAbout This Role
Support and maintain production-grade cloud infrastructure and Kubernetes-based platforms for a client, ensuring high availability, performance, and reliability.
Responsibilities
- Support production-grade cloud infrastructure in major cloud providers (AWS, GCP, or Azure)
- Operate and maintain Kubernetes-based platforms in production environments
- Implement or support monitoring, alerting, and observability solutions (metrics, logs, traces)
- Troubleshoot distributed systems, including performance, availability, and reliability issues
- Participate in on-call rotations, incident response, and root cause analysis
Requirements
- 4+ years of relevant technology experience
- Hands-on experience supporting production-grade cloud infrastructure in AWS, GCP, or Azure
- Practical experience operating and maintaining Kubernetes-based platforms in production environments
- Experience with Infrastructure as Code (IaC) tools such as Terraform, Helm, or CloudFormation
- Working knowledge of CI/CD and GitOps practices, including automated testing and deployment pipelines
- Experience implementing or supporting monitoring, alerting, and observability solutions
- Strong troubleshooting skills across distributed systems
- Proficiency in at least one scripting or programming language (e.g., Python, Go, Bash)
- Experience participating in on-call rotations, incident response, and root cause analysis
- Authorised to work in the US (USC/GC/GC-EAD/H4-EAD/L2S Only)
- Local candidate to Irvine, California, with local DL and local project in CA
Qualifications
- BS degree in Computer Science or related field or equivalent combination of education & experience
- 4+ years of relevant technology experience or equivalent
Nice to Have
- Experience operating multi-cloud environments (AWS, GCP, Azure)
- Experience with event streaming platforms such as Apache Kafka, Kafka Connect, or Amazon MSK
- Familiarity with service mesh technologies (e.g., Istio)
- Exposure to stream processing frameworks (e.g., Apache Flink) and CDC tools such as Debezium
- Experience supporting MLOps or AI infrastructure
- Familiarity with observability standards such as OpenTelemetry and Golden Signals
- Experience working in regulated environments and supporting compliance frameworks (HIPAA, SOC 2, ISO 27001)
- Experience implementing security best practices for cloud-native platforms (IAM, secrets management, RBAC)
- Prior experience in platform engineering or internal developer platforms
- Exposure to cost optimization and FinOps practices in cloud environments
Skills
Python
*
AWS
*
Azure
*
Kubernetes
*
CloudFormation
*
CI/CD
*
Terraform
*
Go
*
Apache Kafka
*
GCP
*
OpenTelemetry
*
Bash
*
Helm
*
MLOps
*
GitOps
*
Istio
*
Apache Flink
*
Kafka Connect
*
Amazon MSK
*
Debezium
*
* Required skills
Related Searches
Similar Jobs
Site Reliability Engineer - Trading
Active
Hunter Bond
·
New York, NY
·
$100,000 - $200,000
Python
Kubernetes
Docker
CI/CD
+9 more
2 weeks ago
Site Reliability Engineer III
Active
JPMorganChase
·
Plano, TX
AWS
Splunk
Kubernetes
AI
+11 more
2 weeks ago
Senior Site Reliability Engineer
Active
BetterUp
·
Chicago, IL
·
$164,000 - $205,000
AWS
Kubernetes
Terraform
Prometheus
+3 more
3 weeks ago
Site Reliability Engineer
Active
AppBuddy
·
Boston, MA
·
$190,000 - $215,000
Python
AWS
Jenkins
Kubernetes
+21 more
3 weeks ago
Site Reliability Engineer
Active
Origami Risk
·
Atlanta, GA
·
$100,000 - $120,000
SQL
AWS
Azure
C++
+9 more
4 weeks ago