Python Developer

Rapid Eagle Inc.

Charlotte, United States of America

yesterday

Role details

Contract type

Permanent contract

Employment type

Full-time (> 32 hours)

Working hours

Regular working hours

Languages

English

Experience level

Senior

Job location

Charlotte, United States of America

Tech stack

Airflow

Amazon Web Services (AWS)

Unit Testing

Azure

Cloud Computing

Software Quality

Continuous Integration

Information Engineering

Software Debugging

Distributed Computing Environment

Distributed Systems

Python

Machine Learning

NoSQL

Object-Oriented Software Development

Open Source Technology

Performance Tuning

TensorFlow

SQL Databases

Management of Software Versions

Workflow Management Systems

PyTorch

Spark

Data Lake

PySpark

Kubernetes

Information Technology

Apache Flink

Kafka

Machine Learning Operations

Stream Processing

Data Pipelines

Docker

Job description

Build and maintain large-scale data processing pipelines using Apache Spark for batch and streaming data.
Design and implement ML training and inference workflows using PyTorch and integrate them into production systems.
Develop and orchestrate ETL and ML pipelines with Apache Airflow, ensuring reliability, scalability, and observability.
Optimize performance of data pipelines and ML model training on distributed clusters.
Collaborate with Data Scientists and ML Engineers to productize models and deploy them into production environments.
Implement best practices for code quality, CI/CD, unit testing, and monitoring.
Ensure data quality, integrity, and security across all pipelines.
Troubleshoot performance bottlenecks and optimize resource utilization.
Stay up to date with advancements in ML frameworks, distributed computing, and workflow orchestration tools.

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, or related field.
5+ years of professional Python development experience, with strong object-oriented programming and software engineering fundamentals.
Hands-on experience with PyTorch for model training and inference.
Deep understanding of Apache Spark for distributed data processing (PySpark or Scala is a plus).
Strong experience with Apache Airflow for workflow orchestration in production environments.
Proficiency in SQL and working with relational and NoSQL databases.
Experience with Docker, Kubernetes, and cloud platforms (AWS/GCP/Azure).
Familiarity with data versioning and ML model lifecycle management (MLflow or similar).
Strong problem-solving and debugging skills in distributed systems.

Preferred Skills

Experience with real-time data processing frameworks (Kafka, Flink).
Knowledge of feature stores, data lake architectures, and Delta Lake.
Familiarity with MLOps practices (CI/CD for ML, model registry, automated retraining).
Experience with GPU-accelerated ML training and performance optimization.
Contribution to open-source ML or data engineering projects.