Data Engineering role that involves working with Cloudera Data Platform (CDP) initially for few months and later transitioning to Google Cloud Platform (GCP) with a focus on on-prem to cloud migration skills:
Cloudera Data Platform (CDP) Skills ( + 2 years experience):
1. Design, develop, and maintain data pipelines using Hive, Spark, Impala, and Kudu on Cloudera Data Platform.
2. Proficient in Scala and SQL for data processing, transformation, and analysis tasks.
3. Experience in using Putty for remote server access and administration.
4. Ability to work with large datasets in Excel for data manipulation and visualization.
5. Knowledge of workflow scheduling tools like Oozie for job automation.
6. Experience in working with alternative data solutions like Alteryx.
7. Optimize data processing and performance using Cloudera Data Platform components.
8. Troubleshoot data-related issues and provide timely resolutions.
9. Collaborate with cross-functional teams to understand and translate business requirements into technical solutions.
10. Ensure data quality, integrity, and security in CDP environments.
Google Cloud Platform (GCP) Skills - On-prem to Cloud Migration (Fresher to +1 year experience):
11. Plan and execute the migration of on-premises data infrastructure to Google Cloud Platform.
12. Design and deploy data processing clusters using GCPs Compute Engine and Dataproc.
13. Implement CI/CD pipelines using Tekton for automated deployment and testing.
14. Utilize GCS buckets for scalable and secure data storage on Google Cloud Platform.
15. Develop data processing workflows using Airflow for orchestration and scheduling.
16. Proficient in programming languages such as Python, Scala, and Spark for data engineering tasks on GCP.
17. Optimize data storage and retrieval performance using BigQuery on GCP.
18. Implement IAM policies and access controls for secure data handling on Google Cloud Platform.
19. Collaborate with cloud architects to design scalable and cost-effective data solutions on GCP.
20. Stay updated on the latest GCP data engineering tools and best practices for continuous improvement in data migration projects.