Lead Data Engineer - Pipelines, Spark Streaming and Spark Offline
Role details
Job location
Tech stack
Job description
As a Lead Data Engineer at JPMorganChase within the Commercial & Investment Bank, you are an integral part of an agile team that works to enhance, build, and deliver data collection, storage, access, and analytics solutions in a secure, stable, and scalable way. As a core technical contributor, you are responsible for maintaining critical data pipelines and architectures across multiple technical areas within various business functions in support of the firm's business objectives., * Collaborate with all of JPMorgan's lines of business and functions to delivery software solutions
- Experiment, Architect, develop and productionize efficient Data pipelines, Data services and Data platforms contributing to the business
- Design and implement highly scalable, efficient and reliable data processing pipelines and perform analysis and insights to drive and optimize business result
- Design and develop features and entities for ML and rule using spark or any bigdata environment
- Acts on previously identified opportunities to converge physical, IT, and data security architecture to manage access
- Applies reuse-first, AI-assisted practices within delivery and operational routines (e.g., backup/recovery validation and access control review support), ensuring traceability/auditability and alignment to resiliency and security expectations
Requirements
- Formal training or certification on Data Engineering concepts and 5+ years applied experience
- Demonstrated experience using enterprise-authorized AI capabilities within the work environment to support data engineering workflows with strong validation habits and awareness of data sensitivity
- Ability to review and validate AI-assisted outputs (e.g., model/design summaries or operational checklists) before use, escalating when uncertain and following data handling requirements
- Experienced programming skills with Python, PySpark
- Experience across the data lifecycle, building Data frameworks, working with Data lakes
- Experience with Batch and Real time Data processing with Spark or Flink and Batch and Real time feature engineering with Spark or Flink or data brick
- Working knowledge of AWS Glue and EMR usage for Data processing and real time data processing and features using Flink or Data brick live tables or Spark streaming
- Experience working with Databricks and data brick live tables
- Experience working in building services using Glue, Lamida, EMR or Flask, and deploying them on AWS EKS or Kubernetes
- Working experience with both relational and NoSQL databases
- Experience in ETL data pipelines both batch and real-time data processing, Data warehousing, NoSQL DB
Preferred qualifications, capabilities, and skills
- Expertise in Amazon Web Services (AWS), Docker, and Kubernetes for cloud-native and containerized data solutions
- Experience in big data technologies: Hadoop, Spark, Kafka, Flink
- Experience in distributed system design and development
Benefits & conditions
We offer a competitive total rewards package including base salary determined based on the role, experience, skill set and location. Those in eligible roles may receive commission-based pay and/or discretionary incentive compensation, paid in the form of cash and/or forfeitable equity, awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. These benefits include comprehensive health care coverage, on-site health and wellness centers, a retirement savings plan, backup childcare, tuition reimbursement, mental health support, financial coaching and more. Additional details about total compensation and benefits will be provided during the hiring process.