Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Senior Data Engineer (aws Data Lakes & Forecasting Pipelines) @ Leewayhertz

Home > Data Science & Machine Learning

 Senior Data Engineer (aws Data Lakes & Forecasting Pipelines)

Job Description

Job Summary:

We are looking for an experienced Senior Data Engineer to lead the development of scalable AWS-native data lake pipelines with a strong focus on time series forecasting and upsert-ready architectures. This role requires end-to-end ownership of the data lifecycle, from ingestion to partitioning, versioning, and BI delivery. The ideal candidate must be highly proficient in AWS data services, PySpark, versioned storage formats like Apache Hudi/Iceberg, and must understand the nuances of data quality and observability in large-scale analytics systems.

Responsibilities:
  • Design and implement data lake zoning (Raw u2192 Clean u2192 Modeled) using Amazon S3, AWS Glue, and Athena.
  • Ingest structured and unstructured datasets including POS, USDA, Circana, and internal sales data.
  • Build versioned and upsert-friendly ETL pipelines using Apache Hudi or Iceberg.
  • Create forecast-ready datasets with lagged, rolling, and trend features for revenue and occupancy modeling.
  • Optimize Athena datasets with partitioning, CTAS queries, and metadata tagging.
  • Implement S3 lifecycle policies, intelligent file partitioning, and audit logging.
  • Build reusable transformation logic using dbt-core or PySpark to support KPIs and time series outputs.
  • Integrate robust data quality checks using custom logs, AWS CloudWatch, or other DQ tooling.
  • Design and manage a forecast feature registry with metrics versioning and traceability.
  • Collaborate with BI and business teams to finalize schema design and deliverables for dashboard consumption.

Requirements

Essential Skills:


Job
  • Deep hands-on experience with AWS Glue, Athena, S3, Step Functions, and Glue Data Catalog.
  • Strong command over PySpark, dbt-core, CTAS query optimization, and partition strategies.
  • Working knowledge of Apache Hudi, Iceberg, or Delta Lake for versioned ingestion.
  • Experience in S3 metadata tagging and scalable data lake design patterns.
  • Expertise in feature engineering and forecasting dataset preparation (lags, trends, windows).
  • Proficiency in Git-based workflows (Bitbucket), CI/CD, and deployment automation.
  • Strong understanding of time series KPIs, such as revenue forecasts, occupancy trends, or demand volatility.
  • Data observability best practices including field-level logging, anomaly alerts, and classification tagging.

Personal

  • Independent, critical thinker with the ability to design for scale and evolving business logic.
  • Strong communication and collaboration with BI, QA, and business stakeholders.
  • High attention to detail in ensuring data accuracy, quality, and documentation.
  • Comfortable interpreting business-level KPIs and transforming them into technical pipelines.

Preferred Skills:


Job
  • Experience with statistical forecasting frameworks such as Prophet, GluonTS, or related libraries.
  • Familiarity with Superset or Streamlit for QA visualization and UAT reporting.
  • Understanding of macroeconomic datasets (USDA, Circana) and third-party data ingestion.

Personal

  • Proactive, ownership-driven mindset with a collaborative approach.
  • Strong communication and collaboration skills.
  • Strong problem-solving skills with attention to detail.
  • Have the ability to work under stringent deadlines and demanding client conditions.
  • Strong analytical and problem-solving skills.
  • Ability to work in fast-paced, delivery-focused environments.
  • Strong mentoring and documentation skills for scaling the platform.

Other Relevant Information:

  • Bachelor degree in Computer Science, Information Technology, or a related field.
  • Minimum 9+ years of experience in data engineering & architecture.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Data Science & Analytics
Role Category: Data Science & Machine Learning
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Leewayhertz
Location(s): Kolkata

+ View Contactajax loader


Keyskills:   IT services metadata Automation GIT Data quality Windows Apache Information technology AWS Auditing

 Job seems aged, it may have been expired!
 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Business Intel Engineer II, Amazon

  • Amazon
  • 3 - 8 years
  • Hyderabad
  • 2 days ago
₹ Not Disclosed

Data Architect

  • Hexaware Technologies
  • 9 - 14 years
  • Chennai
  • 5 days ago
₹ Not Disclosed

AI / ML Engineer

  • Accenture
  • 2 - 5 years
  • Mumbai
  • 5 days ago
₹ Not Disclosed

Associate, Ml Data Operations, Go-ai Operations

  • Amazon
  • 0 - 4 years
  • Hyderabad
  • 10 days ago
₹ Not Disclosed

Leewayhertz

LeewayHertz Technologies Private Limited