Remote
At CloudGeometry, we're redefining how modern data and AI systems are built. As a leading cloud-native engineering firm, we work with pioneering technology companies to deliver high-impact solutions across infrastructure, machine learning, and intelligent applications.
We are looking for a highly skilled AI Infrastructure Engineers x 5 people to join our growing team supporting large-scale AI/ML systems. This is a hands-on engineering role focused on building scalable, secure, and production-ready infrastructure that powers ML workflows end-to-end—from experimentation to deployment and monitoring.
What You’ll Do
Design, implement, and maintain robust infrastructure for ML workflows across real-time and batch environments.
Build and support production-grade model lifecycle systems, including registration, versioning, and deployment workflows.
Develop APIs and backend services in TypeScript and Python to support model integration and orchestration.
Manage and optimize infrastructure using AWS and infrastructure-as-code (CDK preferred).
Work with Databricks MLFlow for end-to-end model management, including asset bundling and serving pipelines.
Collaborate with cross-functional teams including ML scientists, backend engineers, and DevOps to deliver high-impact features.
Monitor and improve infrastructure reliability, security, and performance across diverse deployment targets.
Contribute to CI/CD workflows, container orchestration (Docker, ECS), and automation for ML pipelines.
Why Join CloudGeometry?
You’ll work alongside top-tier engineers across the US, LATAM, and Europe on cutting-edge projects in AI, cloud, and enterprise SaaS. We value deep technical curiosity, strong collaboration, and a bias for action in solving meaningful problems.
Seniority Level**
Mid-Senior level
Industry**
Employment Type**
Full-time
Job Functions**
Engineering
Information Technology
Skills**
Large Language Models (LLM)
Software as a Service (SaaS)
Databricks Products
Python (Programming Language)
Infrastructure
TypeScript
MLflow
MLOps
Amazon Web Services (AWS)
Requirements**
What We’re Looking For
7+ years in software or infrastructure engineering with proven experience supporting AI/ML systems.
Deep hands-on experience with AWS services and modern IaC practices (Terraform/CDK).
Strong backend programming skills in TypeScript and Python.
Production-level use of MLFlow for model management and deployment.
Expertise in containerization (Docker), CI/CD automation, and orchestration tools.
Solid understanding of designing scalable and secure systems in cloud-native environments.
Strong communication skills, able to bridge gaps between engineering and product stakeholders.
Comfortable in fast-paced, collaborative environments working across time zones.
Nice to Have
Exposure to LLM infrastructure and frameworks (e.g., DSPy, LangChain).
Knowledge of LLM performance metrics: latency, cost monitoring, and usage optimization.
Familiarity with semantic search tools and vector stores (e.g., OpenSearch, Pinecone).
Benefits**
Remote anywhere
Coworking space financial coverage
Flexible working hours
B2B with multiple benefits
Paid days off annually
Workspace program: 2500$ for work equipment of your choice.
Paid courses and certifications: example AWS, CKA, ML certifications
Participation at international conferences: like CNCF Summits, Kubecon, others
Submit Resume
Send us an application to jobs@cloudgeometry.com, and we’ll contact you in shortly.