Job Description
Role & Responsibilities: Data Architecture: Design and implement data architecture solutions on AWS, including data lakes, data warehouses, and streaming data pipelines. ETL Development: Develop and maintain ETL (Extract, Transform, Load) processes to ingest, process, and transform data from various sources into a usable format for analytics and reporting. Data Modelling: Create and maintain data models that support analytical and reporting requirements. Optimize data structures for performance and scalability. AWS Services: Leverage a wide range of AWS services such as S3, Redshift, Glue, EMR, Athena, Kinesis, Lambda, and more to build data solutions. Data Quality: Implement data quality checks and monitoring to ensure data accuracy, completeness, and reliability. Performance Optimization: Continuously optimize data pipelines and infrastructure for improved performance, cost efficiency, and scalability. Security and Compliance: Implement data security and compliance best practices, including data encryption, access controls, and auditing. Documentation: Maintain comprehensive documentation of data engineering processes, configurations, and workflows. Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand their data requirements and provide data engineering support. Troubleshooting: Diagnose and resolve data engineering-related issues in a timely manner. Skills Required: Proven experience as a Data Engineer with a strong focus on AWS cloud platform. In-depth knowledge of AWS services and tools for data engineering and analytics. Proficiency in programming languages such as Python, Java, or Scala. Strong SQL skills and experience with relational databases and data warehousing concepts. Experience with big data technologies such as Hadoop, Spark, and Kafka. Understanding of data governance, data quality, and data privacy. Excellent problem-solving and troubleshooting skills. Strong communication and collaboration skills. AWS certifications (e.g., AWS Certified Data Analytics, AWS Certified Data Warehousing) are a plus. 5+ years of experience in data engineering / Data integrations. Formative years may be in development projects in data warehouses, datamarts, ETL, data modelling, data rich middlewares, and distributed computing solutions. 3+ years in Cloud services with at least 1 implementation/development project Last 3 years need to be in big data engineering Cloud Integrations, Serverless, S3/BLOB, Hadoop, HBase, Hive, Spark, Spark streaming, Kafka in-memory database systems, Databases (NoSQL and SQL) Should have programmed extensively in python, Pyspark, scala, java or .net. Should have managed an industry standard program in cloud building data pipelines, data migrations or analytics pipelines. Good to have exposure to Apache airflow. Should have worked as software solution provider at the level of solution architect, technology evangelist in medium to large ISV. Direct experience of at least 3 years in design and delivery of bigdata solutions in healthcare vertical. Good to have: Experience with serverless computing using AWS Lambda. Familiarity with containerization and orchestration using Docker and Kubernetes. Knowledge of DevOps practices and CI/CD pipelines. Experience with data orchestration tools like Apache Airflow
Employement Category:
Employement Type: Full time
Industry: IT Services & Consulting
Role Category: General / Other Software
Functional Area: Not Applicable
Role/Responsibilies: Data Engineer
Keyskills:
AWS
Python
Java
Scala
SQL
Hadoop
Spark
Kafka
Data Governance
Data Quality
Data Privacy
Troubleshooting
Communication
Collaboration
Data Warehousing
ETL
HBase
Hive
Kafka
Java Programming
Docker
Kubernetes
DevOps
Data Engineer
Problemsolving
AWS Certifications
Data Modelling
Big Data Engineering
Cloud Integrations
Serverless
S3BLOB
Spark Streaming
Python Programming
Pyspark
Scala Programming
Cloud Building
Data Pipelines
Data Migrations
Analytics Pipelines
Apache Airflow
Solution Architect
Technology Evangelist
Big Data Solutions
Healthcare Vertical
Serverless Computing
CICD Pipelines
Data Orchestration