Senior Data Engineer

Marathon Petroleum
San Antonio, United States of America
yesterday

Role details

Contract type
Permanent contract
Employment type
Full-time (> 32 hours)
Working hours
Regular working hours
Languages
English
Experience level
Senior

Job location

San Antonio, United States of America

Tech stack

Amazon Web Services (AWS)
Data analysis
Azure
Big Data
Cloud Database
Software Quality
Computer Programming
Databases
Continuous Delivery
Data Architecture
Data Discovery
Information Engineering
Data Governance
Data Infrastructure
Data Integration
Data Integrity
ETL
Data Mining
Data Security
Data Systems
Data Warehousing
Relational Databases
Desktop Computing
DevOps
Distributed Systems
Hadoop
Monitoring of Systems
Information Lifecycle Management
Python
Meta-Data Management
NoSQL
Open Source Technology
Systems Development Life Cycle
Cloud Services
DataOps
Software Engineering
SQL Databases
Unstructured Data
Workflow Management Systems
Data Processing
Data Storage Technologies
Cloud Platform System
Real Time Systems
Data Ingestion
Azure
Spark
Data Lake
Semi-structured Data
Information Technology
Data Lineage
Azure
Data Pipelines
Serverless Computing
Databricks

Job description

As a Senior Data Engineer, you will design, develop, and optimize scalable data pipelines and cloud-based data solutions using modern data architecture, Azure data services, including Data Lake, Synapse Analytics, Azure Functions, and Databricks., * Conducts the design, innovation and optimization of data extraction, ingestion and transformation processes.

  • Facilitates the development and design of complex data architecture to process and store high-volume data sets.
  • Enables the development of complex data pipelines; advocates for and implements data security and privacy measures.
  • Conducts complex data quality and processing tasks using open source and cloud services.
  • Provide technical expertise during critical incidents.
  • Facilitates the adoption of best practices for data security and privacy and collaborates with other departments to ensure seamless data integration.
  • Facilitates the implementation of continuous improvements in data processing methods and drives consistency and best practices across data engineering projects.
  • Solves complex problems; takes a new perspective on existing solutions; participates in strategic planning sessions for data infrastructure.
  • Oversee quality assurance and testing for data solutions.
  • Mentors less experienced data engineers., * Big Data Technologies - Familiarity with big data technologies and frameworks, such as Hadoop, Spark, and distributed computing, for processing and analyzing large volumes of data.
  • Cloud Platform - Knowledge of cloud-based data platforms, such as AWS, Azure, or GCP, and their associated services for data storage, processing, and analytics.
  • Data Governance - Ability to establish and oversee a set of procedures, policies, and standards that ensure the effective and efficient management of an organization's data assets. This includes ensuring data quality, compliance with relevant laws and regulations, and secure data handling practices. It also involves the coordination between different departments to ensure that data is accurate, accessible, and used responsibly and ethically.
  • Data Integration - Proficiency in integrating data from various sources, including structured and unstructured data, using technologies such as ETL (Extract, Transform, Load) processes, data pipelines, and data ingestion frameworks.
  • Data Lifecycle - The data lifecycle refers to the sequential stages that data goes through from its creation or acquisition to its eventual disposal. These stages typically include data creation, storage, processing, analysis archival, and eventual deletion or destruction, with each phase governed by specific policies and practices.
  • Data Modeling - Skill in designing and implementing data models that align with business requirements, ensuring data integrity, performance, and scalability.
  • Data Operations - Data operations refer to the various actions and processes involved in managing, manipulating, and analyzing data throughout its lifecycle. These operations encompass tasks such as collection, storage, retrieval, transformation, and visualization of data to derive meaningful insights and support decision-making.
  • Data Pipelines - Data pipelines are a set of processes that enable the flow of data from one or multiple sources to a destination, often involving tasks such as extraction, transformation, and loading (ETL). These pipelines are designed to efficiently and reliably move and process data, ensuring its quality and accessibility for various analytical and operational purposes.
  • Data Privacy - Ability to understand and implement practices that ensure the protection and confidential handling of personal and sensitive information. This includes knowledge of relevant laws and regulations (such as GDPR or HIPAA), the ability to design and enforce policies that safeguard data, and the skills to manage data access rights and consent protocols.
  • Data Quality Management - Strong understanding of data quality dimensions, methodologies, and best practices to establish and maintain data quality standards and processes.
  • Data Security - Knowledge of data privacy regulations, cybersecurity best practices, and techniques for protecting sensitive information and ensuring compliance.
  • Data Warehousing (DW) - Knowledge of monitoring and observability tools and practices for tracking data pipeline performance, data quality, and system health.
  • DevOps- A set of practices that combines software development and information-technology operations which aims to shorten the systems development life cycle and provide continuous delivery with high software quality and a security first approach.
  • General Programming - Applies a computer language to communicate with computers using a set of instructions and to automate the execution of tasks.
  • Metadata Management - Proficiency in metadata management solutions to enable efficient data discovery, data lineage tracing, and data asset management.
  • NoSQL Databases - NoSQL databases are a type of database management system that provides a flexible and scalable approach to storing and retrieving data, often diverging from the traditional relational database model. Unlike relational databases, NoSQL databases are designed to handle large volumes of unstructured or semi-structured data, offering high performance and horizontal scalability for modern applications.
  • Real Time Processing - Real-time processing refers to the method of handling data or performing computations immediately as they occur, without any noticeable delay. In real-time processing systems, data is processed, and responses are generated within a timeframe that meets the requirements of the application or task, typically within milliseconds or microseconds.

Requirements

This role also includes close collaboration with analytics, product, architecture, governance, and business teams to translate requirements into scalable data solutions. The ideal candidate brings strong technical depth (strong proficiency in Python, SQL, Spark, and orchestration tools such as Azure Data Factory and/or Databricks Workflows), a continuous improvement mindset, and the ability to mentor while contributing to engineering standards, reusable patterns, and solution design reviews., * Bachelor's Degree in Information Technology, related field or equivalent experience

  • 5+ years of relevant data engineering experience, MINIMUM QUALIFICATIONS:Bachelor's Degree in Information Technology, related field or equivalent experience.5+ years of relevant experience

As an energy industry leader, our career opportunities fuel personal and professional growth.

About the company

At MPC, we're committed to being a great place to work - one that welcomes new ideas, encourages diverse perspectives, develops our people, and fosters a collaborative team environment., Resources Department, at talentacquisition@marathonpetroleum.com . Please specify the reasonable accommodation you are requesting, along with the job posting number in which you may be interested. A Human Resources representative will review your request and contact you to discuss a reasonable accommodation. Marathon Petroleum offers a total rewards program which includes, but is not limited to, access to health, vision, and dental insurance, paid time off, 401k matching program, paid parental leave, and educational reimbursement. Detailed benefit information is available at https://mympcbenefits.com .The hired candidate will also be eligible for a discretionary company-sponsored annual bonus program. Equal Opportunity Employer: Veteran / Disability We will consider all qualified Applicants for employment, including those with arrest or conviction records, in a manner consistent with the requirements of applicable state and local laws. In reviewing criminal history in connection with a conditional offer of employment, Marathon will consider the key responsibilities of the role. About Marathon Petroleum Corporation Marathon Petroleum Corporation (MPC) is a leading, integrated, downstream energy company headquartered in Findlay, Ohio. The company operates the nation's largest refining system. MPC's marketing system includes branded locations across the United States, including Marathon brand retail outlets. MPC also owns the general partner and majority limited partner interest in MPLX LP, a midstream company that owns and operates gathering, processing, and fractionation assets, as well as crude oil and light product transportation and logistics infrastructure.

Apply for this position