Senior Software Engineer, Platform Reliability Operations

Remote
Contract Senior Level 5+ years

Posted 1 week ago

Interested in this position?

Upload your resume and we'll match you with this and other relevant opportunities.

Upload Your Resume

About This Role

This role is for a Senior Software Engineer focusing on Platform Reliability Operations. The position involves improving system design, maintaining robust observability systems, and collaborating with development partners to enhance system reliability, performance, efficiency, and scalability.

Responsibilities

  • Analyze and improve system design to reduce failure modes and promote self-healing systems
  • Establish and maintain robust systems that facilitate observability, encompassing logging, monitoring, distributed tracing, alerting, and offline test tools
  • Work with development partners to shape the architecture, design, and implementations of new and existing systems to enhance their reliability, performance, efficiency, and scalability
  • Collaborate with service engineers to establish Service Level Agreements (SLAs) and Service Level Objectives (SLOs) for backend services
  • Identify indications or cues that demonstrate the effectiveness of an application and possess the knowledge to improve or repair its performance
  • Assess options and suggest solutions when there is limited or unclear information, dealing with uncertain situations
  • Respond to emerging incidents, solve critical issues, and follow through with a plan for resolution or future mitigation
  • Act as an SME on the Engineering Operations team, partnering with backend services teams and application teams to overcome challenges across all platforms

Requirements

  • 5+ years’ experience in software development
  • Solid engineering and coding skills, data structure knowledge, and ability to write high-performance production-quality code
  • Experience building service-oriented APIs and cloud services
  • Experience designing, implementing, and deploying microservices
  • Extremely technical hands-on server software experience
  • Proficient in Golang and JavaScript
  • Experience in the Linux environment and a good understanding of its fundamentals and internals
  • A good understanding of large-scale distributed systems in practice, including multi-tier architectures, application security, monitoring, and storage systems
  • Working knowledge of the TCP/IP stack, internet routing, and load balancing

Qualifications

  • Degree in Computer Science or related or equivalent work experience
  • 5+ years’ experience in software development

Nice to Have

  • Golang
  • Typescript
  • Kubernetes
  • Terraform
  • Open telemetry
  • eBPF
  • Datadog
  • Helm Charts
  • HLS video transcoding, distribution & playback
  • Experience designing, implementing, and running services in high demand high-traffic environments
  • Experience with high-availability services

Skills

Kubernetes * JavaScript * Terraform * Linux * Golang * TypeScript * Datadog * Helm charts * Open Telemetry * eBPF *

* Required skills

About GeorgiaTEK Systems Inc.

Technology
View all jobs at GeorgiaTEK Systems Inc. →