Design, develop, and maintain Python-based data processing pipelines for on-prem and file-based systems.
Work with large structured and unstructured datasets for data ingestion, transformation, and validation.
Implement and optimize ETL/ELT processes, ensuring data quality, consistency, and performance.
Collaborate with cross-functional teams to design end-to-end data solutions, including integration with cloud and on-premise environments.
Leverage AWS services (such as S3, Lambda, Glue, Step Functions, or EC2) for data processing and automation.
Participate in solution design discussions and contribute to architectural decisions.
Monitor, troubleshoot, and improve existing data workflows for efficiency and scalability.
Ensure compliance with data governance, security, and performance standards.
Required Skills & Qualifications
- Bachelors degree in Computer Science, Information Technology, or related field.
- 5+ years of hands-on experience with Python for data processing and automation.
- Proven experience with on-premise data workflows and file-based data processing (CSV, JSON, XML, etc.).
- Strong understanding of ETL/ELT design principles and data integration tools.
- Basic to intermediate knowledge of AWS cloud data processing services (e.g., S3, Glue, Lambda, Redshift, or EMR).
- Experience in solution design and data architecture planning.
- Familiarity with SQL and relational databases (MySQL, PostgreSQL, etc.).
- Excellent problem-solving, debugging, and analytical skills.
Good to Have
- Experience with workflow orchestration tools (e.g., Airflow, Prefect, Step Functions).
- Exposure to DevOps practices, CI/CD pipelines, or infrastructure automation.
- Understanding of data security and compliance best practices.
Cloud - AWS - AWS S3, S3 glacier, AWS EBS
Beh - Communication
Database - PostgreSQL - PostgreSQL
Programming Language - Python - OOPS Concepts
Programming Language - Python - Panda
ETL - ETL - Data Stage