Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Pyspark Developer: @ Valuelabs

Home > Software Development

 Pyspark Developer:

Job Description

ValueLabs is a leading provider of technology solutions and services to businesses and organizations around the world. We're passionate about delivering exceptional service and support to our clients, and we're committed to building long-term relationships based on trust, respect, and mutual benefit.


GenAI Product Development | Digital Technology Solutions | ValueLabs - ValueLabs


Location: UAE
Experience: 5+ Years
Employment Type: Full-Time
Education: B.tech

Job Description

We are seeking a skilled and experienced Data Engineer to join our team and contribute to building robust, scalable, and high-performance data pipelines using PySpark on the Cloudera Data Platform (CDP). The ideal candidate will have a strong background in big data technologies, data transformation, and pipeline orchestration.

Key Responsibilities

  • Data Pipeline Development: Design, develop, and maintain scalable ETL pipelines using PySpark on CDP, ensuring data integrity and accuracy.
  • Data Ingestion: Implement and manage ingestion processes from diverse sources (databases, APIs, file systems) into data lakes or warehouses.
  • Data Transformation: Cleanse and transform large datasets to meet analytical and business requirements.
  • Performance Optimization: Tune PySpark code and Cloudera components for optimal performance and resource utilization.
  • Data Quality: Develop and implement data validation and monitoring routines to ensure reliability.
  • Automation & Orchestration: Automate workflows using Apache Oozie, Airflow, or similar tools within the Cloudera ecosystem.
  • Monitoring & Maintenance: Monitor pipeline health, troubleshoot issues, and perform regular maintenance.
  • Collaboration: Work closely with cross-functional teams to understand data needs and support data-driven initiatives.
  • Documentation: Maintain comprehensive documentation of processes, code, and configurations.

Required Skills & Qualifications

  • PySpark: Advanced proficiency with RDDs, DataFrames, and performance tuning.
  • Cloudera Data Platform: Hands-on experience with CDP components like Cloudera Manager, Hive, Impala, HDFS, and HBase.
  • Data Warehousing: Strong understanding of ETL best practices and SQL-based tools.
  • Big Data Technologies: Familiarity with Hadoop, Kafka, and distributed computing.
  • Orchestration Tools: Experience with Apache Oozie, Airflow, or similar frameworks.
  • Linux Scripting: Proficient in shell scripting and automation.

Job Classification

Industry: Banking
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Valuelabs
Location(s): United Arab Emirates

+ View Contactajax loader


Keyskills:   Pyspark Hive Big Data Spark Optimization

 Job seems aged, it may have been expired!
 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Azure Databricks + Pyspark

  • Cognizant
  • 9 - 12 years
  • Chennai
  • 1 month ago
₹ Not Disclosed

Azure Databricks + Pyspark

  • Cognizant
  • 9 - 12 years
  • Chennai
  • 1 month ago
₹ Not Disclosed

Pyspark Developer

  • NTT DATA
  • 4 - 8 years
  • Hyderabad
  • 2 mths ago
₹ Not Disclosed

Lead Analyst - Lead Bigdata Developer - Python, Pyspark & Sql

  • CGI
  • 8 - 13 years
  • Hyderabad
  • 4 days ago
₹ Not Disclosed

Valuelabs

About ValueLabs We are a leading global technology company specializing in Digital Enablement and Product Development. Through our unique OneCompany model of engagement, we help companies unleash the potential of digital technology to achieve real business outcomes, make processes frictionless a...