Lead Data Engineer
Role details
Job location
Tech stack
Job description
· A Tech Lead is primarily responsible for the overall platform vision and ensuring systems do not break under scale.
Requirements
· Distributed Computing: Mastery of frameworks like Apache Spark or Ray for massive-scale parallel data processing.
· Streaming & Event-Driven Architecture: Deep understanding of real-time pipeline design using Kafka, Kinesis, or Flink.
· Cloud Infrastructure: Expertise in at least one major public cloud (AWS), specifically understanding storage/compute decoupling and cost optimization.
· Core Programming & Database Management
· Leads set coding standards and review code, requiring complete fluency in the fundamentals. [1]
· SQL: Advanced mastery for metrics computation, window functions, and query performance tuning across relational and columnar databases (e.g., Snowflake, Redshift, BigQuery).
· Scripting Languages: High proficiency in Python or Scala for writing reusable pipeline code and interacting with APIs.
· Data Storage: Deep familiarity with both columnar/analytical stores and NoSQL databases (e.g., DynamoDb, Cassandra).
· Pipeline Orchestration & DevOps
· Ensuring pipelines run smoothly, idempotently, and securely in production. [1, 2]
· Workflow Orchestration: Ability to architect Directed Acyclic Graphs (DAGs) in tools like Apache Airflow or Prefect.
· CI/CD & Infrastructure as Code (IaC): Applying software engineering principles to data by using Docker, Kubernetes, and Terraform.
· Data Governance & Security: Implementing Role-Based Access Control (RBAC), data masking, and compliance frameworks.
· Leadership & Soft Skills
· Tech leads also mentor junior engineers, estimate project timelines, and translate ambiguous business needs into concrete technical specifications.
· Mentorship & Code Review: Fostering a collaborative development environment and enforcing style guidelines.
· System Observability: Building logging, monitoring, and alerting mechanisms so the team knows exactly when and why pipelines fail.
Must have skills:
· Lead Data Engineer
· Python, AWS- S3, Lambdas
· Glue, Gen-Ai, LLMs, SQL