I've always loved words -- reading them, writing them, learning about them -- so I appreciate that I'm able to apply that love of words to very analytical tasks. A lot of programming and machine learning is focused on math and numbers, but NLP and data science allows me to do interesting tasks without losing the human voice.
● Deployed an ETL pipeline using Airflow, PSQL and Python to examine NLP outputs. These results targeted anomalous false positives.
● Developed flexible annotation pipelines extracting and processing data from a production database. Used LLMs for quick annotation, speeding up time to modeling.
● Developed a comprehensive Python framework for evaluating NLP performance against a gold standard.
● Developed a Python library to extract and structure dates and numerical observations from to improve NER precision.
● Maintained, monitored, and built upon a live NLP pipeline made up of a Java codebase that called out to 2internal Python microservices. Followed clean code practices such as: local and UAT testing; incremental version roll-outs; graceful error handling; code review.
● Maintained and curated an in-memory HSQL database of medical terms.
● Led a team of 4 triaging client problems. Implemented solutions to be put into production, followed up with stakeholders. Created a dashboard to quantify velocity.
* Managed data cleaning, input, and output to databases using tools such as AWS, Hive, Spark, SQL * Developed and maintained models to predict sensitive information in texts in 6 languages using CRF * Managed a team of annotators to provide gold standard annotation for multiple modeling projects in different languages * Evaluated NLP projects across Expedia for viability in new domains * Developed and maintained models for detecting specific keywords and key phrases in vacation rental reviews * Developed classification models using deep learning tools such as BERT and applied DistilBERT for improved performance and in multiple languages