Overview: As a Data Engineer you will work with multiple teams to deliver solutions on the Azure Cloud using core cloud data warehouse tools (Azure Data Factory, Azure Databricks, Azure Synapse Analytics and other Big Data related technologies). In addition to building the next generation of application data platforms (not infrastructure) and/or improving recent implementations. Experience in Databricks and Spark on other cloud platforms like AWS and GCP is also relevant.
Responsibilities:
Defines, designs, develops and test software components/applications using Microsoft Azure- (Azure Databricks, Azure Data Factory, Azure Data Lake Storage, Logic Apps, Azure SQL database, Azure Key Vaults, ADLS)
Strong SQL skills with experience.
Experience handling Structured and unstructured datasets.
Experience in Data Modeling and Advanced SQL techniques
Experience implementing Azure Data Factory, Airflow, AWS Glue or any other data orchestration tool using latest technologies and techniques.
Good exposure in Application Development.
The candidate should work independently with minimal supervision.
Must Have:
Hands on experience with distributed computing framework like DataBricks, Spark- Ecosystem (Spark Core, PySpark, Spark Streaming, SparkSQL)
Willing to work with product teamsto best optimize product features/functions.
Experience on Batch workloads and real time streaming with high volume data frequency
Performance optimization on Spark workloads
Environment setup, user management, Authentication and cluster management on Databricks
Professional curiosity and the ability to enable yourself in new technologies and tasks.
Good understanding of SQL and a good grasp of relational and analytical database management theory and practice.
Good To Have:
Hands on experience with distributed computing framework like DataBricks.
Experience with Databricks migration from On-premise to Cloud OR Cloud to Cloud
Migration of ETL workloadsfrom Apache Spark implementationsto Databricks
Experience on Databricks ML will be a plus
Migration from Spark 2.0 to Spark 3.5
Certifications : Databricks Solution Architect Essentials badge Databricks Developer Essentials Apache Spark Programming with Databricks Data Engineering with Databricks Lakehouse with Delta Lake Deep Dive Fundamentals of Unified Data Analytics with Databricks
Key Skills:
Python, SQL and Pyspark
Big Data Ecosystem (Hadoop, Hive, Sqoop, HDFS, Hbase)
Spark Ecosystem (Spark Core, Spark Streaming, Spark SQL) / Databricks
Azure (ADF, ADB, Logic Apps, Azure SQL database, Azure Key Vaults, ADLS, Synapse)
AWS (Lambda,AWS Glue, S3, Redshift)
Data Modelling, ETL Methodology

Keyskills: Azure Databricks Spark Streaming Data Bricks Azure Data Factory Azure Synapse Pyspark Kafka Azure Data Lake Spark M Python SQL