Overview:
As a Data Engineer, you will work with multiple teams to deliver solutions on the AWS Cloud using core cloud data engineering tools such as Databricks on AWS, AWS Glue, Amazon Redshift, Athena, and other Big Data-related technologies. This role focuses on building the next generation of application-level data platforms and improving recent implementations. Hands-on experience with Apache Spark (PySpark, SparkSQL), Delta Lake, Iceberg, and Databricks is essential.
Responsibilities:
Design and develop data lakes, manage data flows that integrate information from various sources into a common data lake platform through an ETL Tool
Code and manage delta lake implementations on S3 using technologies like Databricks or Apache Hoodie
Triage, debug and fix technical issues related to Data Lakes
Design and Develop Data warehouses for Scale
Design and Evaluate Data Models (Star, Snowflake and Flattened)
Design data access patterns for OLTP and OLAP based transactions
Coordinate with Business and Technical teams through all the phases in the software development life cycle
Participate in making major technical and architectural decisions
Maintain and Manage Code repositories like Git
Must Have:
5+ Years of Experience operating on AWS Cloud with building Data Lake architectures
3+ Years of Experience with AWS Data services like S3, Glue, Lake Formation, EMR, Kinesis, RDS, DMS and Redshift
3+ Years of Experience building Data Warehouses on Snowflake, Redshift, HANA, Teradata, Exasol etc.
3+ Years of working knowledge in Spark
3+ Years of Experience in building Delta Lakes using technologies like Apache Hoodie or Data bricks
3+ Years of Experience working on any ETL tools and technologies
3+ Years of Experience in any programming language (Python, R, Scala, Java Bachelors degree in computer science, information technology, data science, data analytics or related field Experience working on Agile projects and Agile methodology in general
Good To Have:
Strong understanding of RDBMS principles and advanced data modelling techniques.
AWS cloud certification (e.g., AWS Certified Data Analytics Specialty) is a strong plus.
Key Skills:
Languages: Python, SQL, PySpark
Big Data Tools: Apache Spark, Databricks, Apache Hudi
Databricks on AWS
AWS Services: S3, Glue, Lake Formation, EMR, Kinesis, RDS, DMS, Redshift
Data warehouses: Snowflake, Redshift, HANA, Teradata, Exasol
Data Modelling: Star Schema, Snowflake Schema, Flattened Models
DevOps & CI/CD: Git, Agile Methodology, ETL Methodology

Keyskills: Redshift Aws Databricks Unity Catalog Delta Live Table Pyspark Glue Lamda Teradata Python Autoloader Apache Spark