Role Overview
We are seeking an accomplished Senior Data Scientist with expertise in Semantic and Graph Technologies, particularly in RDF, W3C standards, OWL, SPARQL, and ontologies. The role involves parsing US companies’ SEC financial filings (XBRL/XML) into RDF format to construct a semantic graph. You will also extend taxonomies and define a holistic ontology based on XBRL and other well-established financial taxonomies (e.g., OMG frameworks, OpenFIGI, SBRM, FIBO, ISO standards), and develop an extensive ontology to manage complex financial data structures.
Responsibilities
- US SEC Filing Data Ingestion: Build a comprehensive database of financial SEC filings (10-K, 10-Q, 8-K) for US-listed companies. These filings are reported in XBRL (XML) format and need to be parsed into RDF to build semantic graphs. Begin with a company (e.g., Amazon) and expand to a broader universe.
- Ontology Development: Leverage various pre-defined taxonomies (XBRL, SRT, FIBO, etc.) to develop a unified ontology that will serve as the “golden source of truth” for a Defining Knowledge Graph (DKG).
- Query Graph and Generate XML Output: Use SPARQL or other query languages to query the DKG and generate structured, linked financial statements, optionally triggered via basic LLM prompts.
Nice to Have
- Experience parsing unstructured data using NLP, Named Entity Disambiguation (NED), and building vector databases
- Ability to build real-time data pipelines that align with existing taxonomies or ontologies, and extend them as needed
- Familiarity with the finance domain, including reading SEC filings and understanding financial terminology