Solid knowledge of Hadoop ecosystem such as Hive, Iceberg, Spark SQL and Proficiency in Python, Unix ,SQL
Experience with Apache Kafka, Apache Flink, and other relevant streaming technologies.
Strong hands-on experience with Hive and SQL for querying and data transformation
Proficiency in Python for data manipulation and automation
Expertise in Apache Spark (batch and streaming)
Deep understanding of Hadoop ecosystem (HDFS, YARN, MapReduce)
Experience working with Kafka for streaming data pipelines
Experience with workflow orchestration tools (Airflow, Oozie, etc.)
Knowledge of cloud-based big data platforms (AWS EMR, GCP Dataproc, Azure HDInsight)
Fam
iliarity with CI/CD pipelines and version control (Git)
Responsibilities
Design, develop, and maintain scalable data pipelines and ETL processes.
Work with large datasets using Hadoop ecosystem tools (Hive, Spark).
Build and optimize real-time and batch data processing solutions using Kafka and Spark Streaming.
Write efficient, high-performance SQL queries to extract, transform, and load data.
Develop reusable data frameworks and utilities in Python.
Collaborate with data scientists, analysts, and product teams to deliver reliable data solutions.
Monitor, troubleshoot, and optimize big data workflows for performance and cost efficiency.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Data Science & AnalyticsRole Category: Data Science & Machine LearningRole: Data EngineerEmployement Type: Full time