Ingest data from various internal and external sources into AWS Redshift and S3 buckets, utilizing appropriate methods including DMS, Zero-ETL, Kinesis, Glue, Lambda, Lake Formations, Cross account replication and/or SFTP.
Building infrastructure as code using, AWS CDK
Create and utilize GitLab CI/CD pipelines to promote code through test and production environments.
Build glue ETL pipelines to structure and curate data.
Conduct code reviews on all code deployments.
Maintain AWS environment to manage costs, eliminate vulnerabilities and keep the systems working smoothly.
Coordinate and/or complete resolution of production issues.
Drive path to production processes, including documentation and approval processes.
Partner with business team on data governance
Qualifications / Preferred Qualifications
Bachelor of Science degree in Computer Science or equivalent
At least 5 years of post-degree professional experience
3+ years of experience with AWS CDK
3+ years development experience building and maintaining AWS ETL pipelines.
Hands-on experience with AWS services such as S3, Lambda Functions, Step Functions, CDK, AWS Glue and IAM role configuration
Experience with cloud data migration tools such as DMS and Cross Account Replication
Strong knowledge and experience with Python
Familiar with best practices for data ingestion and data design
Track record of advancing new technologies to improve data quality and reliability.
Continuously improve quality, efficiency, and scalability of data pipelines
Self-motivated, work independently, and have direct experience with all aspects of the software development lifecycle, from design to deployment.
Excellent written and spoken communication skills are required as we work in a collaborative cross-functional environment and interact with the full spectrum of business divisions.
Expert skills working with queries/applications, including performance tuning, utilizing indexes, and materialized views to improve query performance.
Identify necessary business rules for extracting data along with functional or technical risks related to data sources (e.g. data latency, frequency, etc.)
Develop initial queries for profiling data, validating analysis, testing assumptions, driving data quality assessment specifications, and define a path to deployment.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Data Science & AnalyticsRole Category: Data Science & Machine LearningRole: Data EngineerEmployement Type: Full time
Contact Details:
Company: Xebia It ArchitectsLocation(s): Noida, Gurugram