About the Role
Be the primary owner for the health, reliability, and observability of hundreds of distributed Linux-based IoT devices deployed nationally. You will architect and build robust backend systems, monitoring pipelines, and internal tools designed to maximize device performance—from remote troubleshooting to on-site diagnostics. This full-stack role bridges embedded and cloud engineering while working closely with operations and hardware teams to deliver scalable, measurable impact.
Key Responsibilities
- Design and implement scalable systems to monitor, diagnose, and improve IoT device health
- Build internal tooling for device setup, fleet QA, and continuous observability
- Develop backend services for device calibration, integration, and real-time reliability
- Investigate and resolve fleet-wide issues using metrics, logs, and remote debugging
- Tune hardware on-site for optimal camera exposure, connectivity, and detection
- Conduct fleet-wide health assessments, recommending firmware and process improvements
- Generate reports on uptime, device performance, and systemic issues for leadership
- Collaborate cross-functionally to advance architecture, tooling, and operational efficiency
- Document playbooks, repair guides, and standard troubleshooting procedures
- Travel periodically (up to 20%) for deployments, QA, and troubleshooting
Qualifications
- 5+ years professional software engineering experience with hands-on management of distributed Linux-based hardware or IoT fleets
- Proficient in Python and SQL, with a strong record of shipping production-quality code
- Experience with monitoring and observability platforms like DataDog, Prometheus, OpenTelemetry, or Grafana
- Familiarity with messaging protocols such as NATS Jetstream or MQTT
- Expertise in Linux system debugging (dmesg, journalctl, systemd, ip, etc.)
- Background in wireless connectivity technologies like 4G LTE, 5G, and WiFi
- Experience building internal tooling, monitoring, or reliability platforms
- Excellent written and verbal communication skills for technical reporting and documentation
- Willingness to travel up to 20% for field deployments and troubleshooting; remote the rest of the time
- Bonus: experience with C++ and knowledge of distributed cloud backend architectures