About the Project
NūR Scientific is developing a backend-heavy prototype for a smart health data system designed to ingest, categorize, and analyze diverse domain-specific data (initially health data), using a modular, ontology-agnostic pipeline This initial MVP will serve as the foundation for future AI-driven capabilities and a full production platform. The architecture must remain adaptable so that additional product modes can be supported later without backend refactoring.
You’ll work closely with NūR’s technical lead to deliver a functional modular backend foundation layer with minimal UI for internal demo purposes. The goal is to complete a 2–8 week prototype that supports modular expansion, reliability, and clean integration points for future AI and analytics layers. All domain-specific rules, thresholds, and schemas must be isolated in configuration files rather than embedded directly into core logic.
Responsibilities
- Design and build a modular backend prototype using Python (FastAPI), SQLAlchemy, and PostgreSQL (or SQLite). Core modules must be domain-agnostic and accept interchangeable ontologies.
- Implement data ingestion pipelines for PDFs, images, CSVs, and URLs, with local storage and metadata tracking.
- Implement automatic file creation and metadata enrichment, including generating and storing thumbnails for uploaded images/PDFs and assigning initial categories/tags for later search and filtering.
- Develop and manage database models for profiles (avatars), documents, and structured entries (initially health entries, with expandability for other domains)
- Build CRUD APIs for core data operations and keyword search with synonym mapping, including basic NLP-powered synonym expansion (e.g., “kidney” → creatinine, eGFR, cystatin C, and related terms, and vice versa).
This may be implemented using either a lightweight rules/config layer (e.g., JSON/YAML synonym lists) or a call to an external LLM/NLP API (no custom model training required), so that semantically related terms surface the same underlying data and files.
- I. Implement rule-based flagging via JSON/YAML config files for domain-specific thresholds (initially health indicators, e.g., creatinine > 1.4 mg/dL → “high”).
- Create data export features (CSV, PDF, ZIP) per user or dataset.
- Add event hooks for post-action triggers (e.g., data updates or re-indexing). Event hooks must remain generic and able to support additional event categories.
- Develop a lightweight internal demo UI using FastAPI + Jinja2 + HTMX + Tailwind for interactive testing (upload, search, export).
- Ensure all code is clean, maintainable, and modular, with a clear setup via uvicorn and .env configuration. No domain-specific logic should be embedded inside core ingestion, structuring, event, or comparison modules.
- Implement a minimal “reasoning hook” endpoint that collects relevant structured data for a user (e.g., recent labs and events) and sends it as context in a single call to an external LLM (e.g., GPT) to generate a plain-language explanation or summary. No custom model training is required; this is a simple, well-structured API callout.
- Push deliverables to NūR’s GitHub repository and support final testing and handover.
Technical Environment
- Languages & Frameworks: Python 3.x, FastAPI, SQLAlchemy
- Database: PostgreSQL (or SQLite for local testing)
- Frontend (Demo UI): FastAPI + Jinja2 + HTMX + Tailwind (CDN)
- Tools: GitHub, Uvicorn, JSON configs for search/flagging logic. Configuration files will also hold all ontology definitions for future expandability.
- Architecture: Modular design with future AI integration hooks. Architecture must be fully ontology-agnostic to allow future non-health modules. The MVP must include a simple, pluggable LLM integration used only for basic explanation/summary generation (single-call, prompt-based; no fine-tuning or model hosting).
Scope Exclusions: Authentication, dashboards, or complex SPA frontends (React/Vue)
Total estimated duration: 2–8 weeks, depending on developer availability (open to full-time or part-time).
Ideal Candidate
- 5+ years of experience in backend development (Python, FastAPI, or equivalent frameworks).
- Strong knowledge of database modeling, query optimization, and API design. Experience designing extensible schemas is strongly preferred.
- Experience with data ingestion from multiple file types and real-time data processing.
- Familiarity with health data systems, data categorization, or reliability analytics is a plus. Experience with modular architectures or domain-agnostic data pipelines is a plus.
- Comfortable building quick prototypes with attention to scalability and maintainability.
- Self-starter with clear communication skills; able to work with minimal oversight.
Feature Summary:
MVP Behavior
✔Ingestion for PDFs/images/CSVs
✔Auto-file creation
✔Thumbnail extraction
✔Manual/Config-based categorization
✔JSON/YAML synonym lists
✔Simple keyword-based search + synonyms
✔Rule-based flags
✔CRUD + models
✔Event hooks
✔Basic graph support
✔Minimal UI
✔One LLM reasoning endpoint
✔Modular architecture for both products