Overview
Primarily looking for an data engineer with expertise in processing data pipelines using Databricks PySpark SQL on Cloud distributions like AWS
Must have AWS Databricks
Good to have PySpark Snowflake Talend
Requirements
Candidate must be experienced working in projects involving
Other ideal qualifications include experiences in
Primarily looking for an data engineer with expertise in processing data pipelines using Databricks Spark SQL on Hadoop distributions like AWS EMR Data bricks Cloudera etc
Should be very proficient in doing large scale data operations using Databricks and overall very comfortable using Python
Familiarity with AWS compute storage and IAM concepts
Experience in working with S3 Data Lake as the storage tier
Any ETL background Talend AWS Glue etc is a plus but not required
Cloud Warehouse experience Snowflake etc is a huge plus
Carefully evaluates alternative risks and solutions before taking action
Optimizes the use of all available resources
Develops solutions to meet business needs that reflect a clear understanding of the objectives practices and procedures of the corporation department and business unit
Skills
Hands on experience on Databricks SparkSQL AWS Cloud platform especially S3 EMR Databricks Cloudera etc
Experience on Shell scripting
Exceptionally strong analytical and problem solving skills
Relevant experience with ETL methods and with retrieving data from dimensional data models and data warehouses
Strong experience with relational databases and data access methods especially SQL
Excellent collaboration and crossfunctional leadership skills
Excellent communication skills both written and verbal
Ability to manage multiple initiatives and priorities in a fastpaced collaborative environment
Ability to leverage data assets to respond to complex questions that require timely answers
has working knowledge on migrating relational and dimensional databases on AWS Cloud platform

Keyskills: sql data operations python relational databases aws cloudera hive scala pyspark data warehousing spark aws cloud aws emr shell scripting hadoop etl big data data lake snowflake talend warehouse data engineering data bricks aws glue sqoop