Senior Data Engineer with DevOps
Role details
Job location
Tech stack
Job description
We are seeking a Data Engineer with 5 years of experience to design and maintain scalable data pipeline supporting analytics, reporting, and operational needs. The role involves collaborating with cross functional teams to ensure data alignment with business requirements and enterprise standards.
This role will require someone at our client site 5 days a week in either Pittsburgh, PA, Cleveland, OH, or Dallas, TX.
For this role on this particular client engagement, employer sponsorship of immigration related visa and/or green card status as part of the PERM process will not be available., Design and build scalable data pipelines aligned with business needs
. Process large dataset (batch + sometimes near Realtime)
. Ensure data quality, consistency, and governance standards across systems
. Support data integration and transformation efforts for analytics and reporting platforms
. Maintain data dictionaries, metadata, and documentation
. Participate in data architecture reviews and model validation processes
. Support analytics reporting and risk platforms
Requirements
5+ years of experience in data engineering and big data processing
. Strong expertise in Apache Spark (Spark Core, Spark SQL) and PySpark for large scale batch processing
. Experience working with structured and semi structured data, including complex transformations and performance tuning
. Proficiency in data ingestion and integration from sources like Oracle, SQL Server, Hive, HDFS, and S3; transform data into 'curated data models'
. Experience writing data to Hive tables, Data Lakes (Iceberg), and downstream reporting systems
. Strong knowledge of SQL and data modeling concepts
. Hands on experience with Apache Airflow for workflow orchestration (DAG design, scheduling expectations, monitoring)
. Proficiency in shell scripting for job automation, file validation, dependency handling, and logging. Trigger Spark Jobs, perform file checks and validation; Archive & purge data; mange job dependency, logging & error handling
. Strong understanding of batch processing and batch job scheduling frameworks
. Experience migrating from CA7/Control M Airflow (daily, hourly, weekly schedules) CI/CD for data pipelines
. Fundamentals in Linux and Networking
. Docker, OCP containerization / Kubernetes
. Knowledge of CI/CD pipeline tools: Tools commonly include Jenkins, GitHub Actions, Azure DevOps, GitLab Cl, Maven, and Gradle
. Automate operational tasks using Python, Bash/Shell, and PowerShell
. Implement monitoring and alerting, Application Insights. Enable centralized logging with tools such as ELK.
. Experience ensuring data quality, reliability, and compliance in regulated environments
. Good communication and documentation skills, * Airflow
- Containerization
- DevOps
- Elastic Stack & Elasticsearch
- GitHub
- Hadoop Hive
- Jenkins
- JSON Web Token (JWT)
- Kubernetes
- OpenShift
- Oracle
- Python
- Shell Script
- SQLite
Benefits & conditions
CGI is required by law in some jurisdictions to include a reasonable estimate of the compensation range for this role. The determination of this range includes various factors not limited to skill set, level, experience, relevant training, and licensure and certifications. To support the ability to reward for merit based performance, CGI typically does not hire individuals at or near the top of the range for their role. Compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range for this role in the U.S. is $79,600.00 $139,300.00.
CGI's benefits are offered to eligible professionals on their first day of employment to include:
. Competitive compensation
. Comprehensive insurance options
. Matching contributions through the 401(k) plan and the share purchase plan
. Paid time off for vacation, holidays, and sick time
. Paid parental leave
.Learning opportunities and tuition assistance
. Wellness and Well being programs