Azure Platform Reliability Engineer IV

Ahold Delhaize USA Salisbury, NC $108,880 - $187,800
Full Time Senior Level 8+ years

Posted 1 week ago

Interested in this position?

Upload your resume and we'll match you with this and other relevant opportunities.

Upload Your Resume

About This Role

This role involves ensuring service availability, automating manual processes, and bridging gaps between product development and operations for Azure services. The engineer will implement operational improvements in various areas like availability, performance, and incident response, focusing on continuous improvement and efficiency through code, modern tools, and better processes.

Responsibilities

  • Build, manage, and operate Azure Core Services with automation and infrastructure as code
  • Manage and operate continuous delivery framework and tools, and automate lifecycle of platform components
  • Leverage cloud architecture, applying site reliability principles and full-stack troubleshooting skills
  • Provide reasoning about system & application architecture and review code to improve reliability
  • Identify automation opportunities to improve patching, service health, manageability, reliability, and telemetry
  • Own, triage, investigate, and resolve service issues with a focus on communication, learning & teaching
  • Design process or technology solutions to monitor, identify, and resolve system and deployment issues
  • Drive security and compliance for services in accordance with Azure compliance requirements
  • Engage in service capacity planning, forecasting, and cost optimization
  • Create and document Runbooks, operational procedures, and standards in Confluence
  • Manage, support, and troubleshoot Linux servers and Linux-based workloads running in cloud environments
  • Implement and automate Linux system operations, patching, performance tuning, and hardening

Requirements

  • 8+ years IT infrastructure experience (server, storage, network, security, identity)
  • 3+ years hands-on experience with IaC tools: ADO, ARM, Terraform, Ansible, PowerShell, Python, azcli, GitHub
  • 4+ years of hands-on operational experience supporting Azure Virtual Network, VWAN, ExpressRoute, Load Balancers, routing (BGP), firewall concepts
  • 4+ years of hands-on operational experience supporting Azure Identity: AAD, PIM, Conditional Access, MFA, AD Connect, Defender, Key Vault
  • 4+ years of hands-on operational experience supporting Azure Governance, Security, Monitoring, Workbooks, Compliance, Cost
  • 4+ years of hands-on operational experience supporting Azure VMs, Containers, Kubernetes/OpenShift (infrastructure side)
  • 4+ years of hands-on operational experience supporting Azure Storage, Backup, Site Recovery, Data Lake
  • Production experience in Cloud technologies: Azure IaaS, PaaS, networking, Azure Functions, Automation, Runbooks, Workbooks, Insights, Security Center, Azure Monitor, Log Analytics
  • Strong experience working in a Linux based environment
  • Ability to design and script telemetry, alerting, and self-healing service capabilities
  • Ability to work in an Extreme Programming environment with paired programming/operations
  • Able to facilitate diverse teams, multitask, and perform under pressure
  • Experience in capacity planning, forecasting, performance analysis, and system tuning
  • Technical and operational expertise in Windows/Linux/VMware/Hyper-V/AKS, SQL/No-SQL DBs, IaaS, PaaS, FaaS, Data, BCDR, Security, Storage, Networking, Monitoring, Identity
  • Strong background in Linux system administration, cloud-based Linux operations, automation, scripting, patching, troubleshooting, and tuning
  • Experience managing code repos, build systems, and CI/CD pipelines
  • Experience with infrastructure/configuration as code and auto-scaling services
  • Worked in DevOps and Agile environments, with both development and SRE mindset
  • Strong system troubleshooting and problem-solving skills with a sense of ownership
  • Participation in on-call rotation and retrospectives
  • Blend of both Development and SRE mindset (i.e., software and infrastructure)

Qualifications

  • Bachelor's Degree in Computer Science or related field (or equivalent work experience)
  • 8+ years IT infrastructure experience (server, storage, network, security, identity)

Nice to Have

  • Certification in Azure DevOps
  • Certification in Azure Solutions Architect

Skills

Python * Azure * PowerShell * Kubernetes * DevOps * Agile * Confluence * CI/CD * Terraform * Linux * Ansible * GitHub * OpenShift * ARM * TDD * ADO * azcli *

* Required skills

Certifications

Azure Administrator Certification (Required)

About Ahold Delhaize USA

A division of global food retailer Ahold Delhaize, part of the U.S. family of brands, which includes five leading omnichannel grocery brands - Food Lion, Giant Food, The GIANT Company, Hannaford and Stop & Shop.

Retail
View all jobs at Ahold Delhaize USA →