Senior Data EngineerBasic Qualifications
Bachelors or Masters degree in Computer Science, Engineering, Data Science, or related field.
6+ years of experience in data engineering, with a proven track record of working in large-scale data initiatives.
Deep expertise in Python, PySpark.
Strong hands-on experience with Databricks (Spark, Delta Lake, Workflows)
Strong experience with AWS (S3, IAM, Textract, Bedrock or equivalent)
Experience with design and implement scalable document ingestion pipelines using Databricks Auto Loader and AWS S3.
Understanding of vector embeddings and semantic search
Strong understanding of data governance, privacy, and compliance in regulated industries (healthcare, life sciences).
Good To have :
Advanced knowledge of data modeling, lakehouse/lake/warehouse design, and performance optimization.
Familiarity with generative AI platforms and use cases.
Contributions to open-source projects or thought leadership in data engineering/architecture.
Experience with Agile methodologies, CI/CD, and DevOps practices.
Exposure to FastAPI, or API-based ML services
Experience evaluating LLM output quality
Key Responsibilities :
Design, develop, and optimize complex data pipelines and transformation processes using Snowflake, dbt, and AWS services.
Develop and maintain scalable data models and schemas in Snowflake, ensuring they meet performance and business requirements.
Monitor and fine-tune the performance of data pipelines, queries, and data models to ensure optimal efficiency and cost-effectiveness.
Utilize Snowflakes features, such as Time Travel, Zero-Copy Cloning, and Data Sharing, to enhance data management and performance.
Leverage AWS services, such as AWS Lambda, S3, and Glue, to build and manage serverless data processing workflows and data storage solutions.
Implement data security measures and ensure compliance with data privacy regulations and organizational policies.
Troubleshoot and resolve complex data issues, including data sync errors, performance bottlenecks, and integration challenges.
Provide support for data-related incidents and ensure effective resolution of production issues.
Collaborate with data analysts, and other stakeholders to understand data needs and deliver effective solutions.
Document data processes, models, and workflows, ensuring clear communication and knowledge sharing across teams.
Independently assess situations, apply sound judgment and discretion, and make decisions on matters of significant impact without direct supervision

Keyskills: Pyspark Data Bricks Python Data Engineering SQL