Personal details

Abubakar O. - Remote data scientist

Abubakar O.

Machine Learning Engineer
Based in: 🇺🇸 United States
Timezone: Central Time (US & Canada) (UTC-5)

About

Analytical AI/ML Engineer with more than 4 years of experience building production agentic AI systems for enterprise clients in Agile environments, specializing in multi-agent orchestration, LLM pipelines, and RAG architectures with LangChain, CrewAI, and MCP. Delivered a 75% reduction in processing time across fully automated pipelines deployed on AWS and Databricks.

Work Experience

Machine Learning Engineer
Unique Computing LLC | Nov 2024 - Present
Machine Learning
Computer Vision
Data Science
Presentations
Modelling
Deep Learning
AWS
Prompt Engineering
Generative AI
LLM
LLaMA
Retrieval-Augmented Generation
Retention

Consulting ML Engineer delivering production AI systems for Optum Serve, a federal healthcare client.

  • Multi-Agent Analytics Platform: Architected a multi-agent platform with an LLM-based intent classifier coordinating two specialized agents: a RAG retrieval agent over pre-generated reports and a DuckDB code-execution agent, enabling business users to query operational data conversationally without SQL or code.
  • SAS-to-Python Modernization Agent: Engineered an agentic SAS-to-Python pipeline using Claude Code CLI with a custom MCP server for documentation upload, GitHub push, and Databricks job execution, generating complete Python packages with tests and docs in a single invocation.
  • SQL-RAG Schema Classifier: Built a BERT-based multilabel classifier identifying schema columns from natural language queries. Synthesized 200k+ labeled examples via Azure OpenAI, tracked via MLflow on SageMaker, achieving 86% average per-label accuracy.
  • Automated Thematic Analysis: Designed a dual-stage metadata-driven LLM pipeline using fine-tuned GPT-4o on Azure OpenAI with multi-level parallelism, reducing survey processing time by 75% (30+ mins to <7 mins).
  • Form Type Classifier: Developed a CPU-only, training-free classifier for scanned federal medical forms using one-shot ORB feature matching and RANSAC alignment, achieving 95–98% confidence across 21 form types with a single exemplar per class.
  • VA Claims ACE Pipeline: Delivered a 4-phase multi-agent pipeline on AWS Bedrock: a preprocessing agent indexes PDFs into a tagged Knowledge Base; five domain-specialized agents perform tag-filtered retrieval with PydanticAI outputs; a Conflict Detection Agent flags inconsistencies; a rules-based ACE Determination Agent routes to human review, approval, or C&P exam.
Data Scientist
Unique Computing LLC | Oct 2021 - Nov 2024
Machine Learning
Computer Vision
Data Science
Presentations
Modelling
Deep Learning
AWS
Prompt Engineering
Generative AI
LLM
LLaMA
Retrieval-Augmented Generation
Retention

Consulting role providing data science solutions across pharmaceutical, government, and enterprise clients

Client: Merck Pharmaceuticals

  • Cell Viability Prediction: Developed real-time cell viability monitoring system using custom YOLO object detection models, achieving 92% accuracy in collaboration with laboratory scientists to automate previously manual video analysis processes.

Client: US Census Bureau

  • Multi-Label Text Classification: Trained a custom BERT-based multilabel token classification model for race coding across a dataset of 2M+ entries, automating 95% of manual data coding tasks.

Internal Projects

  • GenNet AI: Launched an AI-powered physician platform for RAG-based querying over consultation transcripts and patient data with native OpenEMR integration; advised physicians and clinical stakeholders on GenAI adoption roadmap and self-hosted LLM deployment strategy, deploying fully on vLLM and h2oGPT.
  • Churn Prediction Platform: Brought a modular churn prediction pipeline from prototype to production on AWS Step Functions with ECS jobs, serving multiple clients simultaneously. Used PyCaret for automated model selection and SHAP for explainability, achieving 25% reduction in churn rates.

Projects

structx
LLM
structx is a powerful Python library that extracts structured data from text using Large Language Models (LLMs). It dynamically generates type-safe data models and provides consistent, structured extraction with support for complex nested data structures. Whether you're analyzing incident reports, processing documents, or extracting metrics from unstructured text, structx provides a simple, consistent interface with powerful capabilities.
Scrapy-LLM
LLM
Fully Automate the process of web scraping by leveraging a Large Language Model (LLM), use any existing LLM or bring your own!

Education

International Islamic University Malaysia
Bachelor's degree・Computer Science
Jan 2017 - Jun 2021

Certifications & Awards

AI Engineering Professional Certificate (V2)
IBM | Aug 2024
Machine Learning
Stanford University | May 2024