Job Summary
We are looking for a Senior PySpark Developer with 3 to 6 years of experience in building and optimizing data pipelines using PySpark on Databricks, within AWS cloud environments. This role focuses on the modernization of legacy domains, involving integration with systems like Kafka and collaboration across cross-functional teams.
Key yesponsibilities
Develop and optimize scalable PySpark applications on Databricks.
Work with AWS services (S3, EMy, Lambda, Dlue) for cloud-native data processing.
Integrate streaming and batch data sources, especially using Kafka.
Tune Spark jobs for performance, memory, and compute efficiency.
Collaborate with DevOps, product, and analytics teams on delivery and deployment.
Ensure data governance, lineage, and quality compliance across all pipelines.
Required Skills
36 years of hands-on development in PySpark.
Experience with Databricks and performance tuning using Spark UI.
Strong understanding of AWS services, Kafka, and distributed data processing.
Proficient in partitioning, caching, join optimization, and resource configuration.
Familiarity with data formats like Parquet, Avro, and OyC.
Exposure to orchestration tools (Airflow, Databricks Workflows).
Scala experience is a strong plus.

Keyskills: Pyspark AWS Data Bricks Kafka
NTT DATA is a $30+ billion business and technology services leader, serving 75% of the Fortune Global 100. We are committed to accelerating client success and positively impacting society through responsible innovation. We are one of the worlds leading AI and digital infrastructure providers, with...