Position Summary
We are seeking a Data Scientist / Data Analytics Engineer to design, build, and operationalize advanced analytics solutions that drive measurable outcomes across our transportation and logistics operations. This role is responsible for delivering both predictive analytics (forecasting, classification, anomaly detection, optimization) and point-in-time analytics (operational dashboards, KPI reporting, ad-hoc investigations) for internal stakeholders and customers. The successful candidate blends deep statistical and modeling expertise with hands-on data engineering skills on AWS, and brings domain fluency in trucking, freight, brokerage, payment factoring, fleet operations, and broader supply chain workflows.
The role sits at the intersection of data science, data engineering, and product delivery. You will own problems end-to-end: framing the question with stakeholders, selecting and curating data, building robust pipelines, training and validating models, productionizing them on AWS, and ensuring the outputs are trusted, explainable, and actionable.
Key Responsibilities
Predictive Analytics & Modeling
- Design, train, validate, and deploy predictive models (regression, classification, time-series forecasting, survival analysis, clustering, anomaly detection, and gradient-boosted / deep learning approaches as appropriate to the problem).
- Lead model selection, hyperparameter tuning, cross-validation, and rigorous performance evaluation using metrics aligned to business objectives (precision/recall trade-offs, MAPE, RMSE, lift, calibration, etc.).
- Develop data products in areas relevant to transportation, including operational metrics, fraud signals, pricing analytics, industry trends,etc.
- Establish model monitoring, drift detection, retraining cadence, and explainability practices (SHAP, feature importance, partial dependence) to keep production models trustworthy and operationally self sustaining.
Point-in-Time & Operational Analytics
- Produce point-in-time analytics, KPI scorecards, and exception reporting to support daily operational decisions across dispatch, fleet, customer success, finance, and product teams.
- Partner with business stakeholders to translate questions into well-scoped analyses; deliver clear, defensible insights with documented assumptions and data lineage.
- Build and maintain reusable analytical datasets, semantic layers, and certified metrics so the organization works from a consistent source of truth.
Data Engineering & Platform
- Build and maintain data pipelines (batch and streaming) on AWS using services such as Redshift, S3, Glue, Lambda, Step Functions, Kinesis / MSK, EMR, Athena, and SageMaker.
- Implement medallion (bronze / silver / gold) architecture patterns to progressively refine raw operational data into analytics-ready and ML-ready datasets.
- Apply STARR (Star schema / dimensional) modeling and related techniques to build performant, business-friendly data models in Redshift and the broader warehouse layer.
- Drive data selection, curation, profiling, and quality enforcement: define source-of-truth datasets, document lineage, and codify data contracts and validation tests.
- Collaborate with data engineering and platform teams on CI/CD for data and ML assets, infrastructure-as-code (e.g., Terraform / CloudFormation), and cost-aware design on AWS.
Customer-Facing Analytics Products
- Take customer-facing analytics features and products from idea to implementation — partnering with product management, design, and engineering to turn ambiguous business questions into shipped capabilities embedded in customer-facing applications.
- Contribute to product discovery: customer interviews, opportunity sizing, prototyping, and rapid iteration on analytical concepts before committing to full build-out.
- Own the analytical correctness of customer-facing metrics, models, and visualizations — including definitions, edge cases, performance under real-world data conditions, and how results are explained to non-technical end users.
- Define and instrument success metrics for shipped analytics features (adoption, engagement, accuracy in production, customer outcomes) and drive iterative improvements post-launch.
Collaboration & Communication
- Translate complex analytical results into clear narratives, visualizations, and recommendations for both technical and non-technical audiences, including executive leadership and customers.
- Partner cross-functionally with product, engineering, operations, and commercial teams to embed analytics into workflows, applications, and customer-facing products.
- Mentor analysts and engineers on statistical rigor, modeling best practices, and modern data architecture.
Required Qualifications
- Bachelor's degree in Statistics, Mathematics, or Supply Chain Management; a degree in Computer Science is also acceptable. Master's degree preferred but not required.
- Demonstrated professional experience in the transportation, trucking, freight, logistics, or broader supply chain industry, with working knowledge of the underlying operational data (loads, stops, shipments, ELD/telematics, TMS, dispatch, billing, etc.).
- Proven track record of taking customer-facing analytics products or features from idea through implementation and launch — including product discovery, scoping, model and metric design, partnering with product/engineering, and supporting the feature in production with real customers. Candidates should be prepared to walk through at least one concrete example end-to-end.
- Strong applied experience building advanced analytical models end-to-end, including problem framing, data selection and curation, feature engineering, model training and validation, and deployment.
- Hands-on experience with AWS PaaS / analytics tooling, including Amazon Redshift and other relevant services such as S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, and SageMaker.
- Proficiency in SQL (advanced window functions, performance tuning on Redshift or comparable MPP warehouses) and at least one analytics-grade programming language — Python strongly preferred — with libraries such as pandas, scikit-learn, statsmodels, XGBoost/LightGBM, and PyTorch or TensorFlow as appropriate.
- Experience designing and operating production data pipelines, with a clear understanding of orchestration, idempotency, observability, and data quality.
- Solid grounding in statistical methods: hypothesis testing, experimental design, regression, time-series, and uncertainty quantification.
Preferred Qualifications
- Master's degree in Statistics, Mathematics, Operations Research, Supply Chain, Computer Science, or a closely related quantitative field.
- Experience implementing medallion architecture (bronze / silver / gold) in a cloud data lakehouse or warehouse environment.
- Experience designing STARR / star-schema dimensional models for analytics consumption.
- Experience with streaming and event-driven data (Kinesis, Kafka/MSK) for near-real-time analytics on transportation events.
- Experience deploying and monitoring ML models in production using SageMaker, MLflow, or equivalent MLOps tooling.
- Familiarity with BI / visualization tools (e.g., QuickSight, Power BI, Looker) and semantic layer / metrics layer concepts.
- Exposure to optimization and operations research techniques (linear / mixed-integer programming, routing, network flow) applied to transportation problems.
- Experience working with ELD/HOS data, telematics feeds, geospatial data, or TMS / dispatch system data, brokerage data, and general understanding of transportation backoffice operations and business processes.
Core Competencies
- Analytical rigor — comfortable defending methodology, assumptions, and uncertainty to a skeptical audience.
- Business pragmatism — chooses the simplest model that solves the problem and ships value quickly.
- Product mindset — thinks beyond the model to the end-user experience; comfortable iterating on customer-facing analytics features alongside product and engineering partners.
- Engineering discipline — writes clean, version-controlled, testable code; values reproducibility and lineage.
- Stakeholder partnership — listens well, scopes tightly, and communicates trade-offs clearly.
- Curiosity and ownership — investigates anomalies, challenges data quality, and drives issues to root cause.
Representative Tech Environment
- Cloud & Data Platform: AWS (Redshift, S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, SageMaker).
- Modeling & Analysis: Python (pandas, scikit-learn, statsmodels, XGBoost/LightGBM, PyTorch/TensorFlow), SQL, Jupyter.
- Data Architecture: Medallion (bronze/silver/gold), STARR / dimensional models, data contracts, lineage tooling.
- Orchestration & DevOps: Airflow / Step Functions, Git, CI/CD, Terraform or CloudFormation.
- Visualization: QuickSight, Power BI, or Looker (as applicable).