Personal details

Josiah B.

Timezone: Central Time (US & Canada) (UTC-5)

Summary

I have specialized in Big Data technologies, especially Hadoop technologies like Apache Spark, Flume, HBase, HDFS, Hive LLAP, Impala, etc. This career has lead me into developing applications that implement Machine Learning models, predictive algorithms, NLP algorithms, and ingest large datasets. I'm very well versed in concurrent and parallel programming and am really good with both Object Oriented as well as Functional programming approaches.

I really love teaching people and sharing my knowledge. I promise that in the time that I spend mentoring you, I will pour into you as much of my knowledge as I can to give you the best chance possible in the industry.

Technical skills

Ruby・4 yrs Java・8 yrs Scala・4 yrs Python・8 yrs Haskell・2 yrs

Work Experience

Senior Data Engineer

Pinsight Media | Apr 2018 - Present

Scala

Linux

Shell

Pandas

Apache Spark

Apache Kafka

Apache Hadoop

Apache Airflow

Mostly writing Spark processing pipelines on very large (1/2 petabyte or more) datasets.

Hadoop Architect

Triple-I Corporation | AMC Theatres | May 2017 - Apr 2018

Java

Scala

Pandas

Machine Learning

NLP (Natural Language Processing)

Apache Spark

Apache Hadoop

Apache flume

Python 2

I'm playing a lead role in getting AMC Theatres Big Data initiative off the ground. Responsibilities and Accomplishments: - Extended a Spark Sentiment Analyzer written in Scala using Stanford CoreNLP to analyze complex customer feedback. - wrote a custom Flume Source plugin in Java and Scala + Cats for ingesting a vendor's realtime HTTPS event stream - used Scala, Akka, Scalatra, and Cats to develop an HTTP-based Custom Flume Client - Co-Administrator of a CDH5 (Cloudera) cluster - Training for Hadoop software development and Scala programming to peers/engineers - Development process and workflow advisor - Exploratory research and project idea generation - Develop new solutions/Apps leveraging Hadoop technologies including Flume, Spark, Impala, Hive, and HBase - Deploy new Hadoop apps and plugins to a Kerberized CDH 5 cluster - Rig applications to execute through Sysvinit, Upstart, or Systemd - Rig system-initiated applications to auto-authenticate to Kerberos using keytabs - Automation Engineer and advisor - Haskell-style functional programming in Scala using Cats - Imported deeply nested JSON files into Hive and Impala and flattened it out into a traditional SQL table structure. - wrote real-time data ingestion to HDFS apps using Linux Shell scripting, Python, Java, and Scala - Created a Docker CDH 5 development sandbox for prototyping.

Personal Projects

Overnight Website Challenge

2014

HTML/CSS

Ruby on Rails

PostgreSQL

Heroku

JavaScript

Built a new website for KVC Health Systems in 24 hours.

Overnight Website Challenge

2017

HTML/CSS

Ruby on Rails

PostgreSQL

Heroku

Continuous Integration

Docker

React

JavaScript

Continuous Deployment

Redux

Built a website in 24 hours for PrincipalsConnect