Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Senior Data Engineer @ Idexcel

Home > Software Development

Idexcel  Senior Data Engineer

Job Description

Databricks (Spark)

  • Develop scalable ETL/ELT pipelines using PySpark (RDD/DataFrame APIs), Delta Lake, Auto Loader (cloudFiles), and Structured Streaming.
  • Optimize jobs: partitioning, bucketing, Z-Ordering, OPTIMIZE + VACUUM, broadcast joins, AQE, checkpointing.
  • Manage Unity Catalog: catalogs/schemas/tables, data lineage, permissions, secrets, tokens, and cluster policies.
  • CI/CD for Databricks assets: notebooks, Jobs, Repos, MLflow
  • Build Medallion Architecture (Bronze/Silver/Gold) with Delta Live Tables (DLT) and expectations for data quality.
  • Event-driven ingestion: Kafka/Kinesis Databricks Streaming

Snowflake (DW & ELT)

  • Model and implement star/snowflake schemas, data marts, and secure views.
  • Performance tuning: clustering keys, micro-partitions, result caching, warehouse sizing, query profile
  • Implement Task/Stream patterns for CDC; external tables for data lakes (S3); Snowpipe for near-real-time ingestion.
  • Python/Snowpark for transformations and UDFs; SQL best practices (CTEs, window functions).
  • Security: Row Level Security (RLS), Column Masking, OAuth/SCIM, network policies, data sharing (reader accounts).

AWS Data Engineering

  • Storage & compute: S3 (lifecycle, encryption, partitioning), EMR (if needed), Lambda, Glue (ETL/Schema registry), Athena, Kinesis (Data Streams/Firehose), RDS/Aurora, Step Functions.
  • Orchestration: MWAA/Airflow or Step Functions (error handling, retries, backfills, SLA alerts).
  • Infra-as-code: Terraform/CloudFormation for reproducible environments (Databricks workspace, IAM, S3, networking).
  • Security/compliance: IAM least privilege, KMS, VPC endpoints/private links, Secrets Manager, CloudTrail/CloudWatch, GuardDuty.
  • Observability: CloudWatch metrics/logs, structured logging, datadog/Prometheus (optional), cost monitoring (tags/budgets).

Data Quality, Governance & Security

  • Implement unit/integration tests for pipelines (e.g., pytest + Great Expectations + DLT expectations).
  • Data contracts and schema evolution; monitor SLA/SLO; DQ dashboards (missingness, drift, freshness, completeness).
  • PII handling: tokenization/pseudonymization, field-level encryption, KYB/KYC data flows adherence; audit trails.
  • Cataloging & lineage through Unity Catalog and/or OpenLineage/Purview (if applicable).

DevOps & CI/CD

  • Git workflows (branching, PR reviews), Databricks CLI/Terraform modules for jobs/clusters/UC, Snowflake DevOps (object versioning via schemachange or SQL-based migration).
  • Automated testing in pipelines; feature flags, canary releases for data jobs; rollback strategies.

Client-Facing PoCs & Delivery

  • Rapid PoC build: clearly defined success metrics, benchmark cost/performance, produce a transition plan to production.
  • Present architectural decisions, trade-offs (Spark vs Snowflake ELT), and cost projections (Databricks DBU, Snowflake credits, storage egress).
  • Produce runbooks, operational playbooks, and knowledge transfer documents for client teams.

Required Technical Skillset

  • Databricks: PySpark, Delta Lake, Auto Loader, DLT, Jobs, Unity Catalog, MLflow basics.
  • Snowflake: SQL, Snowpipe, Tasks/Streams, Snowpark (Python), warehouse sizing, performance tuning, security policies.
  • Python: strong in packages for DE (pandas, pyarrow, pytest), robust error handling, typing, and packaging.
  • Orchestration: Airflow DAGs (Sensors, Operators, XCom), Step Functions state machines.
  • Streaming & CDC: Kafka/Kinesis, Debezium (nice-to-have), CDC patterns to Delta/Snowflake.
  • AWS: S3, Glue, Lambda, Kinesis, IAM/KMS, VPC, CloudWatch; Terraform/CloudFormation.
  • Data Modeling: 3NF/Dimensional, slowly changing dimensions (SCD Type 2), surrogate keys, surrogate vs natural debates.
  • Security & Compliance: encryption at rest/in transit, tokenization, key rotation, audit logging, governance controls.
  • Performance & Cost: Spark job tuning, Snowflake warehouse right-sizing, partitioning/clustering, object storage best practices.

Nice-to-Have:

  • dbt (Snowflake) with tests & exposures; Great Expectations.
  • Databricks SQL Warehouses and BI connectivity; Photon engine awareness.
  • Lakehouse Federation (UC external locations); Delta Sharing; Iceberg
  • Kafka Connect/Debezium, NiFi or MuleSoft (for data integrations).
  • Experience in financial services
  • Exposure to ISO/IEC 27001 controls in data platforms.

Education & Certifications

  • Bachelors/Masters in CS/IT/EE or related.
  • Certifications (plus): Databricks Data Engineer Associate/Professional, Snowflake SnowPro Core/Advanced, AWS Solutions Architect/Big Data/DP.

Job Classification

Industry: Recruitment / Staffing
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Idexcel
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   Data Engineering PySpark Auto Loader Data Quality DevOps Snowflake Delta Lake CI/CD Data Modeling Databricks ETL AWS Data Governance Python

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Custom Software Engineer

  • Accenture
  • 7 - 12 years
  • Bengaluru
  • 1 day ago
₹ Not Disclosed

Frontend Engineer React JS

  • Einfochips
  • 4 - 12 years
  • Indore
  • 1 day ago
₹ Not Disclosed

Data Architect

  • Accenture
  • 7 - 11 years
  • Pune
  • 1 day ago
₹ Not Disclosed

Custom Software Engineer

  • Accenture
  • 5 - 10 years
  • Pune
  • 2 days ago
₹ Not Disclosed

Idexcel

LAYAM MANAGEMENT SOLUTIONS PVT. LTD.