Sr. Data Scientist - (Only W2 or 1099)
100% Remote
Long Term Contract
Job Description:
The Senior Data Scientist will focus on search and be dedicated to the creation of next-generation AI and Machine Learning techniques and strategies for Client in their global expansion. This candidate will assist with deploying ethical, powerful generative AI solutions with a flexible, multi-model approach that prioritizes using the best model for each individual legal use case. This approach includes working with large language models like Anthropic’s Claude 2, hosted on Amazon Bedrock from Amazon Web Services (AWS), and OpenAI’s GPT-4 and ChatGPT, hosted on Microsoft Azure.
Top Skills Details:
- 10+ years of experience in AI and machine learning, model building and strong coding skills in python.
- 2+ years of working knowledge of applying recent LLMs including ChatGPT, GPT 3.5, OPT, BLOOM, etc. UTILIZING RAG!
- Experience working directly with large language models and Transformer based architectures including BERT, RoBERTa, T5 etc.
- Experience with conversational search / semantic search, reinforcement learning, prompt engineering, hallucination mitigation
- DevOps repos Debugging, building APIs and managing the algorithm flow across multiple workstreams in one repo
- Senior level experience deploying models in the Cloud (AWS) or Azure as secondary.
Core Technical Skills:
Python Proficiency:
- Expert level of Python, with experience in writing efficient, clean, and modular code.
- Ability to debug and test new code thoroughly.
RAG Systems:
- Experience and deep understanding of Retrieval-Augmented Generation (RAG), including concepts like embedding-based search, document retrieval, and combining retrieved information with LLMs.
- Hands-on experience with advanced RAG platform development and maintenance.
- Familiarity with knowledge base creation, indexing, and retrieval pipelines.
Knowledge of AI Architectures:
- Understanding of the end-to-end architecture of generative AI systems, including pre-processing, retrieval, ranking, and post-processing steps.
Prompt Engineering:
- Expertise in crafting effective prompts for LLMs tailored to specific tasks.
- Experience with techniques like zero-shot, few-shot prompting, prompt tuning, and chain of thought.
Content Generation:
- Understanding of generative AI applications in content creation, including best practices for producing accurate, coherent, and domain-specific outputs.
- Ability to fine-tune components for custom use cases.
Debugging and Performance Tuning:
- Skills in profiling and optimizing LLM responses for latency and accuracy.
- Experience diagnosing issues in complex multi-component systems.
- Monorepo and Collaboration Skills.
Working in Monorepo Environments:
- Experience managing and contributing to large, centralized codebases (monorepos).
- Understanding of version control workflows suited for monorepos (e.g., Git-based branching strategies).
Collaboration Tools and Practices:
- Proficient with CI/CD pipelines and tools like Jenkins, GitHub Actions, or GitLab CI.
- Ability to work collaboratively with cross-functional teams in Agile settings.
- Proficiency with code review practices and tools.
- AI and NLP Knowledge
NLP Expertise:
- Solid understanding of transformers, embeddings, and attention mechanisms.
- Familiarity with techniques for handling domain-specific language models.
Complementary Skills:
- Documentation and Communication:
- Ability to write clear technical documentation for processes, workflows, and API usage.
- Strong communication skills for conveying technical insights to stakeholders.
Preferred Experience:
- Previous experience working in legal tech or domain-specific generative AI use cases.
- Hands-on experience with deploying AI models in production at scale.
- Familiarity with multilingual generative AI and fine-tuning for specific languages like French.