Infrastructure Services Director
Posted 1 month ago Expired
This job has expired
Looking for a job like Infrastructure Services Director in or near Reston, VA? Upload your resume and we'll notify you when similar positions become available.
Upload Your ResumeAbout This Role
Lead the establishment and operation of a high-performance AI R&D Lab/Data Center (TALON), delivering high-quality, self-service infrastructure for AI R&D teams. This role involves strategic planning, deep technical expertise, and an unyielding commitment to CMMC compliance and DevOps principles.
Responsibilities
- Lead Data Center Hardware and Software Acquisition, finalizing labor needs and coordinating with OEMs, VARs, Software Vendors, and partners to build compute and transport infrastructure in TALON lab
- Operationalize Data Center, overseeing delivery, receipt, installation, racking/stacking, configuration, integration, and making infrastructure available for service
- Manage TALON Data Center in Dulles, VA, applying DevOps principles in operating and managing configurations for all assets within the TALON on-premise data center, remote nodes, and cloud environments
- Attain CMMC Accreditation for TALON environments, establishing and driving the plan for CMMC accreditation while future-proofing the infrastructure strategy
- Design, implement, and operate network segments and associated infrastructure to securely meet the unique needs of TALON AI cyber projects
- Serve as Technical Lead and administrator for TALON Data Center and TALON lab IT infrastructure
- Maintain data center, audio visual, wifi, software, and all lab IT infrastructure
- Plan, provision, and optimize AWS/Azure/GCP (compute, networking, IAM, cost control); enforce guardrails and landing zones
- Design and secure LAN/WAN/SD-WAN/Wi-Fi, firewalls, with experience managing NIPR and SIPR, and high-level knowledge of JWICS networks
- Implement zero-trust controls, patching, identity, logging/SIEM, and audit readiness (NIST/ISO), and CMMC standards
- Own the service catalog, SLAs, capacity planning, vendor contracts, and budget
- Coordinate with facilities on power, cooling, UPS/generators, and physical security for server rooms
Requirements
- 10+ years of experience with core infrastructure operations (Windows/Linux, virtualization, storage, backups, and disaster recovery)
- HPC cluster system administration, preferably in rapid AI and cyber solution prototyping environments
- Experience with state of the art GPU technologies and their integration into HPC environments (driver management, software stack tools, monitoring, workload scheduling)
- Experience with Infiniband, NVLink, NVQLink, Spectrum-X (driver management, software stack tools, monitoring)
- Experience with Container platforms (ex Apptainer, docker, openshift, Kubernetes, EKS)
- Familiarity and prior work experience with technologies such as Ansible, GIT, Slurm, Zabbix, Prometheus, Grafana and Docker
- Slurm or other cluster schedulers, configuration and management solutions
- NFS, SMB, and distributed object, file, and block storage management and configuration
- High-performance parallel filesystem management and configuration
- Experience installing and repairing servers and associated cluster hardware
- Experience in devising a CMMC strategy and the successful attainment of a CMMC Level 3 accreditation for an AI powered R&D lab serving a government contractor
Qualifications
- 10+ years of experience with core infrastructure operations and HPC cluster system administration.
Skills
* Required skills
Benefits
About Tyto Athene, LLC
Tyto Athene is a trusted leader in IT services and solutions, delivering mission-focused digital transformation that drives measurable success across four core technology domains: Network Modernization, Hybrid Cloud, Cybersecurity, and Enterprise IT.