AI Agent Engineer
Founding Role at Maestrotech Inc
Maestrotech Inc is building an agentic OS for mortgage origination, transforming how loans are structured, priced, submitted, and cleared across TPO portals with mortgage specific AI tools and agents. We are hiring a founding AI Agent Engineer to own the design and implementation
of intelligent agents that power the platform, combining modern agentic patterns, retrieval augmented generation, and multi tenant infrastructure.
You will architect durable agents with episodic and semantic memory, build mortgage specific MCP servers as tools, design and maintain the hybrid retrieval stack, and orchestrate complex agent workflows that always stay grounded in curated mortgage data.
Why Join Maestrotech
- Join an ambitious founding team building the operating system for mortgage origination using truly AI native agents and workflows
- Work in a well funded startup with strong runway, clear product vision, and a massive real world problem in financial services
- Own the full agent lifecycle: design, memory management, tool orchestration, retrieval integration, and production behavior
- Build mortgage specific MCP servers and Agent Skills that become canonical capabilities for pricing, structuring, submission, and clearing conditions
- Work at the frontier of agentic AI with patterns like code execution with MCP, progressive disclosure with Agent Skills, durable agent memory, and hybrid retrieval
What You Will Do
- Design and implement durable agents with layered memory: episodic memory for interactions and task context, semantic memory for grounded mortgage facts and rules, and procedural memory via Agent Skills and reusable code
- Build custom MCP servers for mortgage domain tools including loan product lookup, pricing workflows, TPO portal submission, conditions tracking, document retrieval, and underwriting rule application
- Implement and maintain the hybrid retrieval stack: fine tuned Embedding Gemma for mortgage embeddings, vector database with tenant aware indexing, BM25 based sparse retrieval, Reciprocal Rank Fusion for score fusion, and ColBERT style late interaction for reranking
- Engineer agents to use code execution with MCP, so they write and run code that calls MCP servers, filter and aggregate large result sets in the execution environment, andpersist state and artifacts while keeping token usage low
- Fine tune Embedding Gemma or similar encoder only models on mortgage document pairs using LoRA or sentence transformer style training to improve semantic retrieval for mortgage terminology, without fine tuning large reasoning models
- Define and enforce hallucination guardrails: grounded RAG only, confidence thresholds on retrieval and reranking, in domain question checks, explicit I do not know paths when evidence is missing, and answer generation that always cites retrieved sources
- Design and implement multi tenant retrieval and isolation: ensure all retrieval and tools operate within tenant specific slices using indexed metadata and tenant aware indexing patterns while keeping performance and cost under control
- Build and organize Agent Skills as structured folders of SKILL.md, instructions, scripts, and resources that agents progressively disclose and load on demand for mortgage specific workflows
- Collaborate with frontend and product teams to define how agent state, progress, and decisions are surfaced in the UI, and how humans can review, override, or approve agent actions in mortgage workflows
- Instrument agents with logging and evaluation: track retrieval quality, agent decisions, confidence scores, failure modes, and user feedback to continuously refine skills, tools, and prompts
How We Build Agents at Maestrotech
- Agents use MCP servers as the primary interface to tools and systems, rather than adhoc direct API calls, so behavior is consistent and composable
- Code execution with MCP is used to interact with MCP servers efficiently, loading only the tool definitions and data needed, and performing filtering or transformation in code instead of via long prompt chains
- Retrieval is hybrid by design: dense mortgage aware embeddings for semantic similarity, sparse BM25 for exact mortgage terminology, fused with Reciprocal Rank Fusion and reranked with ColBERT style late interaction for high precision top results
- Agent Skills provide modular, domain specific capabilities by organizing instructions, examples, and scripts into SKILL.md centric folders that agents discover and load via progressive disclosure instead of preloading everything into context
- Memory is explicit and layered: short term context for the current interaction, long term semantic memory backed by retrieval, and procedural memory captured as reusable skills and code rather than brittle one off prompts
- We do not fine tune large reasoning models at this stage; instead, we focus fine tuning on encoder only embedding models for retrieval quality and rely on strong frontier LLMs for reasoning with tight guardrails and grounding
- Multi tenant correctness is non negotiable: every retrieval, tool call, and skill invocation is scoped to the correct tenant using metadata filters and index design so no data ever crosses tenant boundaries
What We Are Looking For
- Experience designing and implementing agentic systems: reasoning and planning loops,
tool selection strategies, error recovery, and long running workflows
- Strong Python engineering skills and familiarity with modern LLM and embedding ecosystems, including using PyTorch or sentence transformers for encoder only model fine tuning
- Practical experience building RAG or retrieval heavy systems: working with vector databases, BM25 or similar sparse retrievers, hybrid retrieval, and reranking
- Familiarity with MCP concepts and experience designing or consuming tool like abstractions for agents, plus comfort working with file based Agent Skills as described in SKILL.md based patterns
- Understanding of when to use small models for embeddings and classification and when to rely on larger hosted LLMs for reasoning, with a clear view on cost, latency, and quality trade offs
- Strong product sense and empathy for mortgage domain users, and the ability to translate complex origination workflows into robust agent behaviors with clear boundaries
Technical Skills
- **Agentic patterns: ReAct and Chain of Thought reasoning loops, tool composition and routing, state management across multi step workflows
- Python: core language for agent development, MCP server implementation, and fine tuning workflows
- Embedding models: Sentence Transformers, fine tuning Embedding Gemma using LoRA via peft library, understanding contrastive learning objectives
- **Retrieval systems: Qdrant Python client for vector search with indexed metadata filtering, BM25 implementation, ColBERT v2.0 for reranking, Reciprocal Rank Fusion for score fusion
- **MCP servers: Claude SDK for MCP server construction, tool definition specification, parameter handling, and response formatting
- PyTorch: model loading and inference, understanding Transformer architecture, inference optimization and quantization
- Code execution: Python script generation within agent loops, file based state persistence, understanding security and sandboxing
- Vector databases: Qdrant (or Pinecone, Milvus) with multi tenant partitioning, indexed metadata filtering, and performance optimization
Nice To Have
- Experience with Agent Skills frameworks or file based agent capability systems
- Experience with knowledge graph construction or schema guided extraction, even if skeptical of production KG
- Familiarity with mortgage origination, underwriting rules, investor guidelines, or TPO workflows
- Contributions or experiments with MCP servers or open source agent frameworks
- Previous work on multi tenant SaaS systems with data isolation and retrieval at scale
The Opportunity
You will define how AI agents will originate mortgages at scale, combining MCP servers, Agent Skills, hybrid retrieval, and durable memory into a system that is grounded, controllable, and safe. You will work at the edge of what is possible with agentic AI, while staying disciplined about where to fine tune, where to rely on strong foundation models, and how to keep answers
factual and in domain. If this resonates, we would love to talk.