Design and implement robust data processing pipelines using Apache Spark, Flink, and Kafka for terabyte-scale industrial datasets
Build efficient APIs and services that serve thousands of concurrent users with sub-second response times
Optimize data storage and retrieval patterns for time-series, sensor, and operational data
Implement advanced caching strategies using Redis and in-memory data structures
Distributed Processing Excellence
Engineer Spark applications with deep understanding of Catalyst optimizer, partitioning strategies, and performance tuning
Develop real-time streaming solutions processing millions of events per second with Kafka and Flink
Design efficient data lake architectures using S3/GCS with optimized partitioning and file formats (Parquet, ORC)
Implement query optimization techniques for OLAP data stores like ClickHouse, Pinot, or Druid
Scalability & Performance
Scale systems to 10K+ QPS while maintaining high availability and data consistency
Optimize JVM performance through garbage collection tuning and memory management
Implement comprehensive monitoring using Prometheus, Grafana, and distributed tracing
Design fault-tolerant architectures with proper circuit breakers and retry mechanisms
Technical Innovation
Contribute to open-source projects in the big data ecosystem (Spark, Kafka, Airflow)
Research and prototype new technologies for industrial data challenges
Collaborate with product teams to translate complex requirements into scalable technical solutions
Participate in architectural reviews and technical design discussions
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: Software DevelopmentRole: Data Platform EngineerEmployement Type: Full time