Seeking a Software Engineer with expertise in GPU Infrastructure,
Ubuntu/Linux OS, and automation for image creation, optimization, and
management across cloud platforms. You will:
Design & automate image lifecycles for GPU Bare Metal and VM images.
Optimize images for security, performance, and efficiency.
Collaborate with hardware, GPU, and firmware engineering teams.
________________________________
Key Responsibilities:
Stakeholder Coordination: Work on GPU image packaging, testing,
firmware versioning, and driver compatibility (Nvidia/AMD –
H100/H200/MI300x/MI325x).
Automation & Scripting: Use Packer, Ansible, Terraform, Python, and
Shell scripting for image deployment automation.
Security & Compliance:
Follow best practices (CIS benchmarks, GDPR, HIPAA).
Handle hardening, patching, and compliance.
________________________________
Additional Responsibilities:
Optimization: Improve machine image performance, boot time, and size
to reduce costs and improve scalability.
CI/CD Integration: Use Jenkins, GitHub Actions, GitLab CI for
automating image creation pipelines.
Versioning & Documentation: Maintain version control, document configs
and best practices.
Monitoring & Troubleshooting: Monitor image deployment performance and
troubleshoot cloud runtime issues.
Continuous Improvement: Stay updated with cloud tech trends in image
building and containerization.
________________________________
Qualifications (Required):
Experience with Linux OS image building.
Understanding of GPU infrastructure & testing (NCCL, RCCL).
Hands-on with image tools: Packer, cloud-init, Ansible, Python, Shell scripting.
Familiar with CI/CD tools: Jenkins, GitHub Actions, GitLab CI.
Security knowledge in image creation (CVE scanning, secrets
management, hardening).
________________________________
Preferred:
Developer experience with AI/ML environments.
Knowledge of Immutable Infrastructure and Golden Image pipelines.
Experience with Chef or Puppet for config management.