SEARGIN IS HIRING!
As a dynamic multinational tech company operating in 50 countries, we drive innovation and create projects that shape the future and greatly enhance the quality of life. You will find our solutions in the space industry, supporting scientists in the development of cancer drugs, and implementing innovative technological solutions for industrial clients worldwide. These are just some of the areas in which we operate!
Currently, for the new Seargin project we are looking for Data Scientist:
Key Responsibilities
- Generative AI Application Development: Collaborate with AI engineers, product owners, business analysts and other developers in Agile teams to integrate LLMs into scalable, robust, fair and ethical end-user applications, focusing on user experience, relevance, and real-time performance
- Algorithm Development: Design, develop, customize, optimize, and fine-tune LLM-based and other AI-infused algorithms tailored to specific use cases such as text generation, summarization, information extraction, chatbots, AI agents, code generation, document analysis, sentiment analysis, data analysis, etc.
- Data Curation for LLMs: Design data pipelines to curate, preprocess, and structure datasets that improve LLM-based algorithms performance and reduce biases, with a focus on data quality and diversity
- Exploratory Data Analysis (EDA): Perform thorough data exploration to understand dataset characteristics, uncover patterns, detect biases, and identify data quality issues; use statistical and visualization techniques to inform feature engineering, model selection, and optimization of LLM-based applications
- Support in Prompt Engineering: support prompt engineers, business analysts and subject matter experts in crafting and optimizing prompts to guide LLM outputs, enhancing performance for specific tasks; be ready to participate in prompt engineering when necessary
- Experimentation and Validation: Conduct rigorous experimentation, including A/B testing, to evaluate algorithm performance against benchmarks and control groups; use metrics specific to generative AI as well as pre-GenAI techniques, as required
- Software Development: Apply software development best practices, including writing unit test; contribute to configuring CI/CD pipelines, containerizing applications, setting up APIs, ensuring robust logging, experiment tracking, and model monitoring
- Continuous Improvement: Collaborate with other developers to monitor deployed algorithms, identify areas for improvement, and collaborate on updates to enhance performance
- Stakeholder Communication: Translate complex technical results into clear, actionable insights for stakeholders, driving data-driven decision-making across the organization
- Ethical AI and Bias Mitigation: Implement techniques to identify and mitigate biases in LLM outputs, ensuring responsible and ethical AI deployment
- Pre-generative AI Application Development: Design and implement classical machine learning and NLP models (e.g., regression, classification, clustering, sequence modeling) when they provide a more efficient, interpretable, or cost-effective solution compared to LLMs; integrate these models into AI applications as needed
Practical skills required
- Experience: 3+ years working with advanced machine learning algorithms
- 3+ years of hands-on experience working with language models, especially those based on Transformer architectures (e.g. BERT, T5, RoBERTa), and at least 1 year of experience with generative large language models (e.g. GPT, LLaMA, Claude, Cohere, etc.)
- Technical Skills: Advanced proficiency in Python and experience with deep learning frameworks such as PyTorch or TensorFlow; expertise with Transformer architectures; hands-on experience with LangChain or similar LLM frameworks
- Experience with designing end-to-end RAG systems using state of the art orchestration frameworks (hands on experience with fine-tuning LLMs for specific tasks and use cases considered as an additional advantage)
- Practical overview and experience with AWS services to design cloud solutions, familiarity with Azure is a plus; experience with working with GenAI specific services like Azure OpenAI, Amazon Bedrock, Amazon SageMaker JumpStart, etc.
- Data Skills: Strong skills in data manipulation, annotation, and crafting datasets that maximize LLM effectiveness; experience in working with data stores like vector, relational, NoSQL databases and data lakes through APIs; experience with data augmentation techniques or synthetic data generation in the context of LLMs considered as a plus
- Prompt Engineering: Hands-on experience with prompt design, zero-shot, and few-shot learning paradigms to optimize LLM performance without extensive training or fine-tuning
- Evaluation Metrics: Deep understanding of generative model and pre-GenAI evaluation techniques
- NLP Expertise: Solid foundation in natural language processing, including tokenization, embeddings, attention mechanisms, and transfer learning specific to LLMs
- Statistical Knowledge: Strong background in statistics, machine learning algorithms, and optimization techniques
- Classical Machine Learning & NLP: Experience with traditional NLP techniques and classical machine learning algorithms (e.g., decision trees, SVMs, random forests, gradient boosting) for text analysis and structured data applications
- Pre-LLM Model Development: Hands-on experience developing and deploying machine learning models for tasks such as classification, clustering, regression, and sequence modeling using frameworks like Scikit-learn, XGBoost, or traditional NLP pipelines
- Feature Engineering & Data Preprocessing: Strong skills in feature engineering, dimensionality reduction, text preprocessing, and structured data transformation to improve model performance
- Deployment: Experience in deploying LLM models with cloud platforms (AWS, Azure) and machine learning workbenches for robust and scalable productization
- Proficiency in best practices of software engineering
- Problem Solving: Excellent analytical skills and the ability to tackle complex challenges with innovative solutions
- Communication: Strong verbal and written communication skills, with the ability to present complex findings clearly to both technical and non-technical audiences
The successful candidate should also
- hold B.Sc., B.Eng., M.Sc., M.Eng., Ph.D. or D.Eng. in Computer Science, Physics, Statistics, Mathematics or equivalent degree and experience with Artificial Intelligence be passionate about AI and stay up-to-date with the latest developments in LLMs, GenAI, and AI in general
- be team-oriented, proactive, and collaborative
- be an excellent problem solver and analytical thinker be detail-oriented and highly organized
- be willing to learn and expand their skill set
- have the ability to work collaboratively in a fast-paced, dynamic environment
- be able to communicate in English at the level of: C1+
- be located near the Central European time zone, or willing to work at a time consistent with the Central European time zone