Senior Software Engineer, Platform Reliability Operations
Remote
Contract
Senior Level
5+ years
Posted 1 week ago
Interested in this position?
Upload your resume and we'll match you with this and other relevant opportunities.
Upload Your ResumeAbout This Role
This role is for a Senior Software Engineer focusing on Platform Reliability Operations. The position involves improving system design, maintaining robust observability systems, and collaborating with development partners to enhance system reliability, performance, efficiency, and scalability.
Responsibilities
- Analyze and improve system design to reduce failure modes and promote self-healing systems
- Establish and maintain robust systems that facilitate observability, encompassing logging, monitoring, distributed tracing, alerting, and offline test tools
- Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability
- Collaborate with service engineers to establish Service Level Agreements (SLAs) and Service Level Objectives (SLOs) for backend services
- Identify indications or cues that demonstrate the effectiveness of an application and possess the knowledge to improve or repair its performance
- Assess options and suggest solutions when there is limited or unclear information, dealing with uncertain situations
- Respond to emerging incidents, solve critical issues, and follow through with a plan for resolution or future mitigation
- Act as an SME on the Engineering Operations team, partnering with backend services teams and application teams to overcome challenges across all platforms
Requirements
- 5+ years’ experience in software development
- Solid engineering and coding skills, data structure knowledge, and ability to write high-performance production-quality code
- Experience building service-oriented APIs and cloud services
- Experience designing, implementing, and deploying microservices
- Extremely technical hands-on server software experience
- Proficient in Golang and JavaScript
- Experience in the Linux environment and a good understanding of its fundamentals and internals
- A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems
- Working knowledge of the TCP/IP stack, internet routing, and load balancing
Qualifications
- Degree in Computer Science or related or equivalent work experience
- 5+ years’ experience in software development
Nice to Have
- Golang
- Typescript
- Kubernetes
- Terraform
- Open telemetry
- eBPF
- Datadog
- Helm Charts
- HLS video transcoding, distribution & playback
- Experience designing, implementing, and running services in high demand high-traffic environments
- Experience with high-availability services
Skills
Kubernetes
*
JavaScript
*
Terraform
*
Linux
*
Golang
*
TypeScript
*
Datadog
*
Helm charts
*
Open Telemetry
*
eBPF
*
* Required skills