Senior Data Engineer
Pinsight Media | Apr 2018 - Present
Scala
Linux
Shell
Pandas
Apache Spark
Apache Kafka
Apache Hadoop
Apache Airflow
Mostly writing Spark processing pipelines on very large (1/2 petabyte or more) datasets.
Hadoop Architect
Triple-I Corporation | AMC Theatres | May 2017 - Apr 2018
Java
Scala
Pandas
Machine Learning
NLP (Natural Language Processing)
Apache Spark
Apache Hadoop
Apache flume
Python 2
I'm playing a lead role in getting AMC Theatres Big Data initiative off the ground.
Responsibilities and Accomplishments:
- Extended a Spark Sentiment Analyzer written in Scala using Stanford CoreNLP to analyze complex customer feedback.
- wrote a custom Flume Source plugin in Java and Scala + Cats for ingesting a vendor's realtime HTTPS event stream
- used Scala, Akka, Scalatra, and Cats to develop an HTTP-based Custom Flume Client
- Co-Administrator of a CDH5 (Cloudera) cluster
- Training for Hadoop software development and Scala programming to peers/engineers
- Development process and workflow advisor
- Exploratory research and project idea generation
- Develop new solutions/Apps leveraging Hadoop technologies including Flume, Spark, Impala, Hive, and HBase
- Deploy new Hadoop apps and plugins to a Kerberized CDH 5 cluster
- Rig applications to execute through Sysvinit, Upstart, or Systemd
- Rig system-initiated applications to auto-authenticate to Kerberos using keytabs
- Automation Engineer and advisor
- Haskell-style functional programming in Scala using Cats
- Imported deeply nested JSON files into Hive and Impala and flattened it out into a traditional SQL table structure.
- wrote real-time data ingestion to HDFS apps using Linux Shell scripting, Python, Java, and Scala
- Created a Docker CDH 5 development sandbox for prototyping.