We are looking for a Data Scientist with expertise in Python, Azure Cloud, NLP, Forecasting, and large-scale data processing. The role involves enhancing existing ML models, optimising embeddings, LDA models, RAG architectures, and forecasting models, and migrating data pipelines to Azure Databricks for scalability and efficiency.
Key Responsibilities:
Model Development:
• Model Development & Optimisation
• Train and optimise models for new data providers, ensuring seamless integration.
• Enhance models for dynamic input handling.
• Improve LDA model performance to handle a higher number of clusters efficiently.
• Optimise RAG (Retrieval-Augmented Generation) architecture to enhance recommendation accuracy for large datasets.
• Upgrade Retrieval QA architecture for improved chatbot performance on large datasets.
Forecasting & Time Series Modelling
• Develop and optimise forecasting models for marketing, demandprediction, and trend analysis.
• Implementtime series models (e.g., ARIMA, Prophet, LSTMs) to improve business decision-making.
• Integrate NLP-based forecasting, leveraging customer sentiment and external data sources (e.g., news, social media).
Data Pipeline & Cloud Migration
Migrate the existing pipeline from Azure Synapse to Azure Databricks and retrain models accordingly- Note: this is required only for the AUB role(s)
• Address space and time complexity issues in embedding storage and retrieval on
Azure Blob Storage.
• Optimise embedding storage and retrieval in Azure Blob Storage for better efficiency.
MLOps & Deployment
• Implement MLOps best practices for model deployment on Azure ML, Azure
Kubernetes Service (AKS), and Azure Functions.
• Automate model training, inference pipelines, and API deployments using Azure services.
Experience:
• Experience in Data Science, Machine Learning, Deep Learning and Gen AI.
• Design, Architect and Execute end to end Data Science pipelines which includes Data extraction, data preprocessing, Feature engineering, Model building, tuning andDeployment.
• Experience in leading a team and responsible for project delivery.
• Experience in Building end to end machine learning pipelines with expertise in developingCI/CD pipelines using Azure Synapse pipelines, Databricks, Google Vertex AI and AWS.
• Experience in developing advanced natural language processing (NLP) systems, specializingin building RAG (Retrieval-Augmented Generation) models using Langchain. Deploy RAG models to production.
• Have expertise in buildingMachine learning pipelines and deploy various models like Forecasting models, Anomaly Detection models, Market Mix Models, Classification models, Regression models and Clustering Techniques.
• MaintainingGithub repositories and cloud computing resources for effective and efficient version control, development, testing and production.
• Developingproof-of-concept solutions and assisting in rolling these out to our clients.
Required Skills & Qualifications:
• Hands-on experience with Azure Databricks, Azure ML, Azure Synapse, Azure
Blob Storage, and Azure Kubernetes Service (AKS).
• Experience with forecasting models, time series analysis, and predictive analytics.
• Proficiency in Python (NumPy, Pandas, TensorFlow, PyTorch, Statsmodels, Scikit- learn, Hugging Face, FAISS).
• Experience with model deployment, API optimisation, and serverless architectures.
• Hands-on experience with Docker, Kubernetes, and MLflow for tracking and scaling ML models.
• Expertise in optimising time complexity, memory efficiency, and scalability of ML
models in a cloud environment.
• Experience with Langchain or equivalent and RAG and multi-agentic generation