Hands-on experience with GCP services, specifically BigQuery, Cloud Storage, and Composer for data pipeline orchestration
Proficiency in Databricks platform with PySpark for building and optimizing large-scale ETL/ELT processes
Expertise in writing and tuning complex SQL queries for data transformation, aggregation, and reporting on large datasets
Experience integrating data from multiple sources such as APIs, cloud storage, and databases into a central data warehouse
Familiarity with workflow orchestration tools like Apache Airflow or Cloud Composer for scheduling, monitoring, and managing data jobs
Knowledge of version control systems (Git), CI/CD practices, and Agile development methodologies
Overall Responsibilities
Design, develop, and maintain scalable data pipelines using GCP, Pyspark, and associated tools
Write efficient, well-documented SQL queries to support data transformation, data quality, and reporting needs
Integrate data from diverse sources, including APIs, cloud storage, and databases, to create a reliable central data repository
Develop automated workflows and schedules for data processing tasks utilizing Composer or Airflow
Collaborate with data analysts, data scientists, and business stakeholders to understand data requirements and deliver solutions
Monitor, troubleshoot, and optimize data pipelines for performance, scalability, and reliability
Maintain data security, privacy standards, and documentation compliance
Stay informed about emerging data engineering technologies and apply them effectively to improve workflows
9811293***
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: Software DevelopmentRole: Data EngineerEmployement Type: Full time