About Us:
The Burning Glass Institute (BGI) is a leading data laboratory and labor market research center dedicated to advancing economic opportunity and mobility for all. We analyze unique, large-scale datasets and construct proprietary models that uncover powerful new insights across the education, labor market, and corporate landscapes - leading to research that has been featured in the New York Times, Wall Street Journal, and other major outlets. The Institute has also built out a scaled impact laboratory, focused on designing powerful and scalable new tools to support policymakers and other key decision makers, as well as informing and sparking public debate. Our fast-growing team is seeking an intellectually curious Data Scientist to join us in leading cutting-edge research on the future of work and learning.
Please read the application instructions at the bottom of the job description.
Position Overview:
As a Data Scientist at the Burning Glass Institute, you will play a crucial role in building our organization’s data infrastructure. The data team at BGI work to clean, extract, and classify hundreds of millions of job postings and worker profiles from across the globe. In this role, you would use your data engineering, NLP, and programming skills to help expand our capabilities to build large-scale pipelines to process text data, extract new elements, and build features used by the research team. Additionally, you will have the opportunity to act as a lead Data Scientist in consulting projects for companies, universities, and workforce agencies and develop impactful industry research publications and product solutions. You will work in a small team of data scientists and engineers but will also be in close collaboration with our economists and product teams. Extensive experience applying NLP methods to large collections of text data is a must.
Key Responsibilities:
Data Infrastructure: The Burning Glass Institute is a growing organization, and you will play a key role in shaping our in-house data infrastructure. You will be responsible for data cleaning, classification, and feature engineering using large-scale economic and labor market datasets. You should be proficient in SQL and Python, and experienced in NLP, machine learning, and LLM methods to process and analyze hundreds of millions of documents. You will also collaborate closely with economists to support research initiatives and develop data-driven product solutions.
Project Coordination and Communication: Your core team will be a small group of data scientists and engineers with the ambitious goal of developing and continuously improving state-of-the-art datasets for understanding the labor market. This requires strong collaboration, the ability to work independently in developing advanced data models, and coordination of outputs into cohesive databases. Equally important is your ability to clearly communicate results, methods, strengths, and weaknesses of data models to both technical and non-technical audiences, including economists, the product team, and external stakeholders, through visualizations and research reports. Balancing technical project management with clear, insightful communication will be essential to your success.
Required Qualifications:
· Master's degree in computer science, mathematics, or a similarly quantitative subject AND 3+ years of data-science experience
o OR Doctorate in one of these subjects AND 1+ years of data-science experience
· Significant experience in using NLP methods to clean and categorize raw data is a must.
· Experience using Python and SQL to clean and manipulate data of many formats, including unstructured and semi-structured
· Proficiency in working with larger-than-memory datasets using Python and parallel computing libraries (multiprocessing, Dask, PySpark, etc.)
· Expertise in supervised and unsupervised machine learning and deep learning methods, including common toolkits like Scikit-Learn and PyTorch
· Comfort with applying NLP techniques to large corpuses of text, including but not limited to regex, semantic search, clustering, and NER
· Deep understanding of the neural architecture of LLMs and other neural sequence models
· Ability to provide and explain analyses and visualizations for technical and non-technical audiences
· Experience working in AWS or other cloud computing platforms
· Detail-oriented approach with a strong commitment to data accuracy.
· Effective time management skills to prioritize tasks, work efficiently, and meet project deadlines while maintaining the quality of work.
· Ability to maintain significant overlap with Eastern Time (ET) working hours for effective collaboration.
Preferred Qualifications
· Familiarity with structured data representation and knowledge graph technologies.
· Strong knowledge of data structures, algorithms, and design patterns.
· Experience building data dashboards.
· The ability to attend in-person meetings in various US locations and working sessions in our New York office is a plus.
Application Process
To apply, please:
Please note that we won’t be able to review your application otherwise. We look forward to learning more about how your skills and interests align with the work of the Burning Glass Institute!