Staff Data Engineer (8+ years of experience)
About the role
As the first Staff Data Engineer, you’ll take the lead in shaping a modern data foundation that supports everything from BI reporting and product analytics to real-time machine learning. You’ll collaborate closely with the Head of Data to architect and deliver the core systems, pipelines, and data layers that enable an AI-native organization.
This role blends software engineering and data engineering at a deep level. You’re comfortable thinking about distributed computation, data modeling, pipeline design, and ML enablement all at once. Your work will prioritize reliability, scalability, and clarity so that every downstream user, from analysts to ML engineers, can operate quickly and confidently.
This is a unique chance to build a contemporary data stack from the ground up while working with high-volume IoT data and defining how intelligence moves through the entire platform.
What you’ll do
● Design and build the central data platform that powers analytics, intelligence, and decision-making.
● Develop scalable batch and streaming pipelines across a wide range of sources, including IoT devices, operational systems, and external integrations.
● Work with high-performance database technologies, including time-series and relational systems, to store, query, and serve large telemetry datasets efficiently.
● Create and maintain foundational data models and interfaces that support analytics, product development, and ML workloads.
● Partner with AI, ML, Product, and Software Engineering teams to ensure data is readily usable for real-time and strategic insights.
● Establish best practices around data quality, monitoring, lineage, and governance.
● Assess, adopt, and integrate modern data tools and technologies (examples include warehouse, storage, compute, orchestration, and streaming ecosystems) to continuously improve platform capabilities.
● Work alongside the Head of Data to define long-term architectural direction and contribute to a unified, ML-ready data environment.
● Mentor engineers across the organization on data engineering concepts and help raise engineering standards.
Ideal candidate credentials
● 8 or more years of experience as a data or software engineer with responsibility for large-scale data systems used for analytics or machine learning.
● Strong experience building and supporting ETL and data pipeline frameworks using tools such as Python, Spark, Airflow, dbt, or similar technologies.
● Deep familiarity with modern data infrastructure, including data lakes, warehouses, streaming platforms, and distributed compute environments.
● Expert-level SQL and experience creating efficient, maintainable data models.
● Solid understanding of CI/CD practices, infrastructure-as-code, and observability tooling.
● Experience supporting ML workflows and understanding the data requirements across the model lifecycle.
● Comfortable with cloud-native data ecosystems, particularly in environments centered on storage, compute, orchestration, and ML services.
● Strong software engineering fundamentals: version control, testing, documentation, and reviewing code.
● A systems-oriented thinker who prioritizes reliability, correctness, and maintainability.
● Strong communication skills and a collaborative approach to solving complex data challenges.
Location: Remote inside the United States with occasional travel (1x a quarter) to Houston TX.
Keywords: Data engineering, Data platform, Distributed systems, Data modeling, ETL, Data pipelines, Batch processing, Streaming processing, IoT data, Telemetry data, Real-time analytics, Machine learning enablement, Data quality, Data observability, Data governance, Data lineage, High-volume data, Time-series data, Data architecture, ML workflows, Data-driven decision-making, Cloud-native data systems, Scalability, Reliability, Maintainability, CI/CD, Infrastructure-as-code, Python, SQL, Spark, Airflow, dbt, Kafka, Time-series databases, Relational databases, Data warehouses, Data lakes, Orchestration tools, Distributed computing frameworks, Cloud storage services, Serverless compute tools, ML platforms, Version control, Monitoring tools.