Actively recruiting / 13 applicants
We’re here to help you
Juliana Torrisi is in direct contact with the company and can answer any questions you may have. Email
Juliana Torrisi, RecruiterRole Overview
Join our innovative team as a Data Engineer, where you'll play a pivotal role in shaping and maintaining our data infrastructure. You'll be responsible for developing and managing the data layer that powers our machine learning pipeline, ensuring data integrity from ingestion through transformation to delivery. This position is ideal for a meticulous professional passionate about building reliable, production-grade systems and eager to grow with a dynamic, focused team.
Responsibilities
- Data Ingestion: Develop and manage pipelines for data retrieval from diverse sources, including crypto exchanges and traditional market feeds, ensuring seamless integration despite rate limits, API quirks, and authentication challenges.
- Data Validation: Implement rigorous checks for data completeness, consistency, and correctness, including schema validation and freshness monitoring.
- Transformation Layer: Convert raw data into clean, analysis-ready formats, addressing time series alignment and managing data gaps.
- Storage and Access: Design schemas optimized for efficient data storage and retrieval, ensuring effective lifecycle and retention management.
- Monitoring and Alerting: Establish observability into pipeline health to preemptively identify and address issues.
Required Skills
- Python: Professional-level proficiency with a focus on writing clean, maintainable code for data pipelines. Experience with Pandas and NumPy is essential.
- SQL: Advanced skills in SQL, including complex queries, optimization, and schema design, with a preference for PostgreSQL expertise.
- Data Pipeline Design: Experience in building resilient pipelines that handle real-world data challenges, including workflow orchestration tools like Airflow or Prefect.
- Data Quality: Strong focus on maintaining data quality through validation checks and anomaly detection.
Nice to Have
- Experience with financial or crypto data, understanding market data conventions and exchange-specific nuances.
- Familiarity with managing time series data at scale, including temporal joins and windowed computations.
- Experience with high-dimensional feature stores and managing large feature sets.
Clarify Ambiguities
- Technical Environment: We primarily use PostgreSQL, Python, and cloud infrastructure (GCP preferred), with flexibility in workflow orchestration tools.
- Education: A degree in a quantitative or technical field is preferred, though equivalent experience is also considered.
Sensitivity
Please note that information regarding compensation, contract terms, and interview processes has been intentionally omitted.