Python Developer
Rapid Eagle Inc.
Charlotte, United States of America
yesterday
Role details
Contract type
Permanent contract Employment type
Full-time (> 32 hours) Working hours
Regular working hours Languages
English Experience level
SeniorJob location
Charlotte, United States of America
Tech stack
Airflow
Amazon Web Services (AWS)
Unit Testing
Azure
Cloud Computing
Software Quality
Continuous Integration
Information Engineering
Software Debugging
Distributed Computing Environment
Distributed Systems
Python
Machine Learning
NoSQL
Object-Oriented Software Development
Open Source Technology
Performance Tuning
TensorFlow
SQL Databases
Management of Software Versions
Workflow Management Systems
PyTorch
Spark
Data Lake
PySpark
Kubernetes
Information Technology
Apache Flink
Kafka
Machine Learning Operations
Stream Processing
Data Pipelines
Docker
Job description
- Build and maintain large-scale data processing pipelines using Apache Spark for batch and streaming data.
- Design and implement ML training and inference workflows using PyTorch and integrate them into production systems.
- Develop and orchestrate ETL and ML pipelines with Apache Airflow, ensuring reliability, scalability, and observability.
- Optimize performance of data pipelines and ML model training on distributed clusters.
- Collaborate with Data Scientists and ML Engineers to productize models and deploy them into production environments.
- Implement best practices for code quality, CI/CD, unit testing, and monitoring.
- Ensure data quality, integrity, and security across all pipelines.
- Troubleshoot performance bottlenecks and optimize resource utilization.
- Stay up to date with advancements in ML frameworks, distributed computing, and workflow orchestration tools.
Requirements
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 5+ years of professional Python development experience, with strong object-oriented programming and software engineering fundamentals.
- Hands-on experience with PyTorch for model training and inference.
- Deep understanding of Apache Spark for distributed data processing (PySpark or Scala is a plus).
- Strong experience with Apache Airflow for workflow orchestration in production environments.
- Proficiency in SQL and working with relational and NoSQL databases.
- Experience with Docker, Kubernetes, and cloud platforms (AWS/GCP/Azure).
- Familiarity with data versioning and ML model lifecycle management (MLflow or similar).
- Strong problem-solving and debugging skills in distributed systems.
Preferred Skills
- Experience with real-time data processing frameworks (Kafka, Flink).
- Knowledge of feature stores, data lake architectures, and Delta Lake.
- Familiarity with MLOps practices (CI/CD for ML, model registry, automated retraining).
- Experience with GPU-accelerated ML training and performance optimization.
- Contribution to open-source ML or data engineering projects.