Data Scientist

Insight Global
Atlanta, United States of America
30 days ago

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Junior
Compensation
$ 177K

Job location

Atlanta, United States of America

Tech stack

Artificial Intelligence
JIRA
Cloud Engineering
Continuous Integration
Data Dictionary
ETL
Python
Metadata
Software Quality Assurance (SQA)
Data Logging
Large Language Models
Spark
GIT
Codebase
Machine Learning Operations
Databricks

Job description

designed to improve the quality, continuity, and analytical usefulness of AMI interval data across APC, GPC, and MPC. The platform will use AI/ML to perform data gap-fill, usage reconstruction, and short-term/long-term forecasting, and enable downstream analytics. The primary objective for this vendor engagement is to deliver a production-grade, scalable, accurate ML-based gap-fill and forecasting engine that can be integrated into Southern Company's data ecosystem.

  1. Scope of Work 2.1 Core Functional Requirements A. Data Gap-Fill Engine Develop or extend models that can:
  • Fill missing AMI interval data at 15-minute and hourly resolutions

  • Handle short gaps (<2 hours) and long gaps (1-72 hours) with model-based reconstruction.

  • Produce reconstruction confidence scores for each interval.

  • Incorporate: o Weather (temperature, humidity, solar irradiance) o Calendar effects (weekday/weekend, season) o Outage periods (explicitly excluding outage windows from prediction) B. Predictive Usage Modeling

  • Short-term usage prediction (next 24-48 hours)

  • Longer-horizon predictions (up to 7-14 days)

  • Ability to run inference on millions of meter records across all OpCos C. MLOps and Deployment

  • Run natively on Databricks within Southern Company's secure cloud (no external hosting).

  • Provide automated pipelines for: o Training o Batch inference o Monitoring and accuracy drift detection

  • Deliver source code, notebooks, CI/CD scripts, and documentation.

  1. Data Inputs & Volumes 3.1 Data Sources
  • AMI interval data

  • Weather feeds

  • Outage metadata

  • Billing calendar markers (not used for billing VEE; for analytics only) 3.2 Data Characteristics

  • 96 intervals/day per meter (15-min)

  • Multi-year historical availability (1-3 years)

  • High presence of gaps, including consecutive missing intervals

  1. Performance Requirements 4.1 Model(s) Accuracy Measurable model KPIs for:
  • MAE / RMSE for filled intervals

  • Gap-fill bias (interval-level and daily aggregated)

  • Performance across: o 15-minute kWh o Hourly kWh o Golden meters (consecutive gap conditions) o Non-gold meters (random missing intervals) 4.2 Scalability Model(s) must:

  • Process 4.4M meters (15-min resolution) in batch production cycles

  • Support parallelization / Spark-based architecture 4.3 Operational Expectations

  • Runtime targets defined for daily and weekly pipelines

  • Monitoring hooks for: o Model drift o Data Anomalies o Input/outage alignment issues

  1. Deliverables

  2. ML Models & Codebase o Gap-fill models (primary deliverable) o Forecasting models o Modular architecture for future extensions

  3. Documentation o Model(s) documentation o Data dictionaries o Deployment runbooks

  4. MLOps Pipelines o Orchestrated Databricks jobs o Git-based CI/CD workflows

  5. Dashboards / Visual QA Tools o before/after gap-fill visualizations o overlay and comparison tools

  6. Training & Knowledge Transfer o Sessions for AMI Data Science team o Code walkthroughs o Handover documentation

  7. Non-Functional Requirements 6.1 Security

  • All work must remain within Southern Company's cloud environment

  • No external data movement permitted 6.2 Governance

  • Conform to Southern Company metadata, tagging, and logging standards

  • Models must not be used for billing (analytics-only) 6.3 Vendor Collaboration Expectations

  • Weekly progress meetings with Solomon, Joyce S. and project team

  • Transparent issue escalation

  • Ability to collaborate using Jira (AMI team instance)

  • PM is already instated

  • 1 Data Scientist from AMI DSA will work in this project

  1. Evaluation Criteria for Vendors
  • Technical strength and ML methodology
  • Scalability and cloud architecture alignment
  • Experience with utility AMI datasets
  • Clarity of proposed MLOps approach
  • Documentation quality
  • Speed to delivery
  • Total cost and licensing structure

Requirements

3-5 years of experience as a data scientist (or similar title) 3+ years of experience using AI/ML Strong Python programming experience Understanding/Experience within AMI data Background within a utility Experience with LLM (large language model) Ability to deliver source code, notebooks, CI/CD scripts, and documentation

Benefits & conditions

Benefit packages for this role will start on the 1st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.

Apply for this position