Job Description
As a
Data Engineer , you are required toDesign, build, and maintain data pipelines that efficiently process and transport data from various sources to storage systems or processing environments while ensuring data integrity, consistency, and accuracy across the entire data pipeline. Integrate data from different systems, often involving data cleaning, transformation (ETL), and validation. Design the structure of databases and data storage systems, including the design of schemas, tables, and relationships between datasets to enable efficient querying. Work closely with data scientists, analysts, and other stakeholders to understand their data needs and ensure that the data is structured in a way that makes it accessible and usable. Stay up-to-date with the latest trends and technologies in the data engineering space, such as new data storage solutions, processing frameworks, and cloud technologies. Evaluate and implement new tools to improve data engineering processes.
Qualification Bachelor's or Master's in Computer Science & Engineering, or equivalent. Professional Degree in Data Science, Engineering is desirable.
Experience level At least3- 5years hands-on experience in Data Engineering
Desired Knowledge & Experience Spark: Spark 3.x, RDD/DataFrames/SQL, Batch/Structured Streaming Knowing Spark internalsCatalyst/Tungsten/Photon Databricks: Workflows, SQL Warehouses/Endpoints, DLT, Pipelines, Unity, Autoloader IDE: IntelliJ/Pycharm, Git, Azure Devops, Github Copilot Test: pytest, Great Expectations CI/CD Yaml Azure Pipelines, Continuous Delivery, Acceptance Testing Big Data Design: Lakehouse/Medallion Architecture, Parquet/Delta, Partitioning, Distribution, Data Skew, Compaction Languages: Python/Functional Programming (FP) SQL TSQL/Spark SQL/HiveQL Storage Data Lake and Big Data Storage Designadditionally it is helpful to know basics of:
Data Pipelines ADF/Synapse Pipelines/Oozie/Airflow Languages: Scala, Java NoSQL :Cosmos, Mongo, Cassandra Cubes SSAS (ROLAP, HOLAP, MOLAP), AAS, Tabular Model SQL Server TSQL, Stored Procedures Hadoop HDInsight/MapReduce/HDFS/YARN/Oozie/Hive/HBase/Ambari/Ranger/Atlas/Kafka Data Catalog Azure Purview, Apache Atlas, Informatica Required Soft skills & Other Capabilities Great attention to detail and good analytical abilities. Good planning and organizational skills Collaborative approach to sharing ideas and finding solutions Ability to work independently and also in a global team environment.
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time
Contact Details:
Company: Siemens
Location(s): Bengaluru
Keyskills:
hive
spark
hadoop
python
data engineering
continuous integration
scala
ci/cd
t-sql
sql
stored procedures
java
git
big data
mongodb
hbase
data lake
ssas
pytest
oozie
sql server
nosql
continuous delivery
mapreduce
cassandra
pycharm
kafka
yarn