Florencia Suarez Varady is in direct contact with the company and can answer any questions you may have. Email
Location: Remote
Duration: 4 weeks (with potential for extension)
Start: ASAP
Platform: AI-native personalized shopping platform
We’re building a new kind of shopping experience – powered by real-time personalization and agentic AI. Our goal is to help users stop searching and start feeling understood by a smart, conversational assistant who knows their taste, mood, and intent. As we prepare for our private beta (Friends & Family launch), we’re investing in infrastructure that can scale with fast, intelligent experiences.
We’re seeking a high-performance ML Engineer with strong experience in:
• Real-time inference optimization
• Vector-based recommendation systems
• Low-latency infrastructure (Pinecone, FAISS, Weaviate, Matching Engine, or similar)
• LLM-based inference workflows (reranking, summarization)
You’ll work closely with our AI Research and Backend teams to help us:
• Diagnose and fix latency bottlenecks
• Optimize embedding flows (Vertex AI to Pinecone)
• Test and integrate fast inference engines (Cerebras, Vertex AI, or similar)
• Improve the performance of our AI assistant across recommendations, chat, and product summaries
• Audit our current architecture (BigQuery, Vertex AI, Pinecone) and identify latency hotspots
• Help optimize our inference stack for under 100ms end-to-end user interactions
• Profile and streamline embedding generation and retrieval pipeline (Vertex AI to Pinecone)
• Support backend integration of fast inference APIs (Cerebras or comparable LLM endpoints)
• Assist in testing and refining reranking workflows for product recommendations
• Collaborate with the Research team to ensure the AI assistant’s responses are fast and scalable
• Strong background in ML inference optimization and deployment
• Deep understanding of vector search (Pinecone, FAISS, pgvector, etc.)
• Hands-on experience with embedding workflows and model pipelines
• Comfortable integrating with LLM APIs (OpenAI, Vertex AI, Cohere, or Cerebras)
• Experience profiling and improving performance in production ML systems
• Bonus: Familiarity with Vertex AI Matching Engine, BigQuery, or GCP-native tools
• Work on a real-time AI-native consumer product – no fluffy infra, just hard, fun problems
• Direct collaboration with founders and research leadership
• Tight feedback loop with actual users in a live product
• Make an outsized impact on a high-stakes early launch