Job Description
Scala and Python or Java programming language coding absolutely must have
AI is absolutely must have
Knowledge of SQL
Understanding of distributed computing
Experience of working on real-time streaming data (absolute must have)
Awareness of any cloud technology.
They are currently running on on-prem hardware.
Some form of ETL (Informatica or any other) is good to have.
Responsibilities:
Understanding of Hadoop Ecosystem including but not limited to HDFS, MR, YARN, and
components like Hive, Sqoop, Spark, Flume, etc.
Ensure architecture will support the business requirements.
Work closely with business/product stakeholders in understanding requirements and
translating them to technical requirements.
Should be able to develop Spark programs either using Python or Scala for scalable data
processing.
Need to recommend and sometimes implement ways to improve data reliability,
efficiency, and quality.
Requirements:
At least 5+ years of experience in building big data pipelines using Kafka, Spark, and other
Hadoop ecosystem tools.
Experience in distributed Data processing technologies like Kafka, Spark, HDFS etc
Experience in scala programming & Shell scripting
Knowledge in optimizing large-scale data systems for scalability, resiliency, and
performance.
Skills in database schema design and SQL
Experience with business-critical environments and production systems
Good to have AWS cloud platform.
Good to have knowledge on installation set up of whole Hadoop cluster.
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Data Science & Analytics
Role Category: Data Science & Machine Learning
Role: Data Engineer
Employement Type: Full time
Contact Details:
Company: Vigoursoft Global
Location(s): Bengaluru
Keyskills:
Coding
Shell scripting
Schema
SCALA
flume
Data processing
Informatica
sqoop
SQL
Python