Machine Learning Engineer / Data Scientist
Role details
Job location
Tech stack
Job description
We're hiring a mid-to-senior Machine Learning Engineer / Data Scientist to build and deploy machine learning solutions that drive measurable business impact. You'll work across the ML lifecycle-from problem framing and data exploration to model development, evaluation, deployment, and monitoring-often in partnership with client stakeholders and internal delivery teams., * Problem Framing & Stakeholder Partnership
- Translate business questions into ML problem statements (classification, regression, time series forecasting, clustering, anomaly detection, recommendation, etc.).
- Collaborate with stakeholders to define success metrics, evaluation plans, and practical constraints (latency, interpretability, cost, data availability).
- Data Analysis & Feature Engineering
- Use SQL and Python to extract, join, and analyze data from relational databases and data warehouses.
- Perform data profiling, missingness analysis, leakage checks, and exploratory analysis to guide modeling choices.
- Build robust feature pipelines (aggregation, encoding, scaling, embeddings where appropriate) and document assumptions.
- Model Development (Core ML)
- Train and tune supervised learning models for tabular data (e.g., logistic/linear models, tree-based methods, gradient boosting such as XGBoost/LightGBM/CatBoost, and neural nets for structured data).
- Apply strong tabular modeling practices: handling missing data, categorical encoding, leakage prevention, class imbalance strategies, calibration, and robust cross-validation.
- Build time series models (statistical and ML/DL approaches) and validate with proper backtesting.
- Apply clustering and segmentation techniques (k-means, hierarchical, DBSCAN, Gaussian mixtures) and evaluate stability and usefulness.
- Apply statistics in practice (hypothesis testing, confidence intervals, sampling, experiment design) to support inference and decision-making.
- Deep Learning
- Build and train deep learning models using PyTorch or TensorFlow/Keras.
- Use best practices for training (regularization, calibration, class imbalance handling, reproducibility, sound train/val/test design).
- Evaluation, Explainability, and Iteration
- Choose appropriate metrics (AUC/F1/PR, RMSE/MAE/MAPE, calibration, lift, and business KPIs) and create evaluation reports.
- Perform error analysis and interpretation (feature importance/SHAP, cohort slicing) and iterate based on evidence.
- Productionization & MLOps (Project-Dependent)
- Package models for deployment (batch scoring pipelines or real-time APIs) and collaborate with engineers on integration.
- Implement practical MLOps: versioning, reproducible training, automated evaluation, monitoring for drift/performance, and retraining plans.
- Documentation & Communication
- Communicate tradeoffs and recommendations clearly to technical and non-technical stakeholders.
- Create documentation and lightweight demos that make results actionable., * You deliver models that perform well and move business metrics (revenue lift, cost reduction, risk reduction, improved forecast accuracy, operational efficiency).
- Your work is reproducible and production-aware: clear data lineage, robust evaluation, and a credible path to deployment/monitoring.
- Stakeholders trust your judgment in selecting methods and communicating uncertainty honestly.
Requirements
You should be strong in core data science and applied machine learning, comfortable working with real-world data, and capable of turning modeling work into production-ready systems., * 3-8 years of experience in data science, machine learning engineering, or applied ML (mid-to-senior).
- Strong Python skills for data analysis and modeling (pandas/numpy/scikit-learn or equivalent).
- Strong SQL skills (joins, window functions, aggregation, performance awareness).
- Solid foundation in statistics (hypothesis testing, uncertainty, bias/variance, sampling) and practical experimentation mindset.
- Hands-on experience across multiple model types, including:
- Classification & regression
- Time series forecasting
- Clustering/segmentation
- Experience with deep learning in PyTorch or TensorFlow/Keras.
- Strong problem-solving skills: ability to work with ambiguous goals and messy data.
- Clear communication skills and ability to translate analysis into decisions., * Experience with Databricks for applied ML (e.g., Spark, Delta Lake, MLflow, Databricks Jobs/Workflows).
- Experience deploying models to production (APIs, batch pipelines) and maintaining them over time (monitoring, retraining).
- Experience with orchestration tools (Airflow, Prefect, Dagster) and modern data stacks (Snowflake/BigQuery/Redshift/Databricks).
- Experience with cloud platforms (AWS/GCP/Azure/IBM) and containerization (Docker).
- Experience with responsible AI and governance best practices (privacy/PII handling, auditability, access controls).
- Consulting or client-facing delivery experience.
Certifications (Strong Plus) Candidates with at least one relevant certification are especially encouraged to apply:
- Cloud certifications: AWS, Google Cloud, Microsoft Azure, or IBM (data/AI/ML tracks)
- Databricks certifications (Data Scientist, Data Engineer, or related), * Causal inference experience (e.g., quasi-experimental methods, propensity scores, uplift/heterogeneous treatment effects, experimentation beyond A/B tests).
- Agentic development experience: designing and evaluating agentic workflows (tool use, planning, memory/state, guardrails) and integrating them into products.
- Deep familiarity with agentic coding tools and workflows for accelerated product development (e.g., AI-assisted IDEs, code agents, automated testing/refactoring, repo-aware assistants), including strong judgment on quality, security, and maintainability.