Description:
Senior Data Engineer (Spark & Lakehouse)
Location: Remote, India (Preferred: Bangalore/Pune)
Experience: 6+ Years
Domain: Data Engineering / Big Data
About the Role:
We are seeking a Senior Data Engineer to drive the development of our next-generation Data Lakehouse architecture.
You will be responsible for designing, building, and optimizing massive-scale, low-latency data pipelines that support real-time analytics and Machine Learning applications.
Key Responsibilities:
- Design and build highly optimized, production-grade ETL/ELT pipelines using Apache Spark (PySpark/Scala) to process petabytes of data.
- Architect and manage the Data Lakehouse using open-source technologies like Delta Lake or Apache Hudi for ACID transactions and data quality.
- Integrate and process real-time data streams using technologies such as Apache Kafka or Kinesis.
- Implement automated data quality checks, monitoring, and lineage tracking across all data products.
- Collaborate with the infrastructure team to automate data platform deployment and scaling on the cloud (AWS EMR/Glue or Databricks) using Terraform.
- Optimize data warehousing and querying performance in platforms like Snowflake or Google BigQuery.
Technical Skills Required:
- Expert proficiency and tuning experience with Apache Spark (PySpark or Scala).
- Mandatory experience with Data Lakehouse technologies (Delta Lake, Iceberg, or Hudi).
- Strong experience with at least one public cloud data platform (AWS, GCP, or Azure).
- Solid knowledge of data modeling (Dimensional, Data Vault) and advanced SQL.
- Experience with workflow orchestration tools like Apache Airflow or Prefect

Keyskills: Data Engineering DataLake Data Quality PySpark Scala Data Management Kafka Spark