Sole is in direct contact with the company and can answer any questions you may have. Email
We’re looking for experienced Data Scientists and AI/ML Engineers to help build an MVP of the most advanced andinterconnected market intelligence B2B SaaS platform. You will design and develop data pipelines, knowledge graphs, and NLP models focused on financial data and regulatory filings.
• Parsing and building hybrid databases: Using predefined taxonomies and ontologies (XBRL, SBRM, FIBO)
and NLP, parse XML, HTML, and PDF files to segregate and tag structured and unstructured data from financial
documents such as SEC filings, Call Transcripts, Investor Presentations, and News Feeds. Link structured
data to semantic graphs and unstructured data to a vector database.
• Real-time data pipelines: Create real-time data pipelines to ingest, tag, and place new entities in real time.
Detect new tags and tweak or extend schema, taxonomy, or ontologies as needed.
• LLM Query System: Implement Hybrid RAG on graphs and vector databases to accurately answer queries.
Establish system accuracy using the FinQA and ConvFinQA datasets.
Machine Learning & NLP
Large Language Models (LLMs): Experience with proprietary (e.g., GPT-4, Claude) and/or open-source models
NLP Models: BERT, FinBERT with Named Entity Recognition (NER) and Named Entity Disambiguation (NED)
Graph-Based Machine Learning: Knowledge of GNNs, RDNs, and Knowledge Graph Embedding techniques such as TransE, TransH, and DistMult
Matching Algorithms: Familiarity with exact, fuzzy, and rule-based matching techniques
Parsing & Data Extraction
XBRL Parsing: Arelle
XML Parsing: pandas + openpyxl, Apache POI
PDF Parsing: pypdfium2, PyMuPDF, OCR-based parsers
HTML Parsing: Beautiful Soup, lxml (Python); jsoup (Java); cheerio (JavaScript)
ETL / Data Processing
Real-Time Pipelines & External API Integration: Apache NiFi, Kafka, Druid
Ontology & Semantic Web
Ontology extension and management tools: Protégé, HermiT, RDF4J, Jena, Alignment API
Workflow Orchestration
Experience with orchestration tools: Apache Airflow, Prefect, Dagster
Optional Skills (UI/UX)
SaaS Front-end Development: React, HTML, CSS, JavaScript
Graph Visualization Tools
Neo4j Bloom, Cytoscape, Graphistry, GraphXR
• Familiarity with SEC filings, US GAAP Taxonomy (XBRL), and financial statements.
• Understanding of financial ontologies and familiarity with financial/macroeconomic datasets.
• Basic-level familiarity with full-stack, including UI development skills.