Job Description
ROLE SUMMARY
The AI Acceleration (AIA) function within the Chief Marketing Office (CMO) is the single, business-led engine that owns the design, delivery, and scale-up of priority AI capabilities across Commercial. AIA works in tight collaboration with various Pfizer functions to deploy and maintain production-grade AI solutions that simplify how we work and drive measurable value across all processes.
As a Data engineer in the newly formed AIA team, you should be able to design build, integrate, curate, and operationalize data and models into a semantic layer to power AI-enabled products. Additionally, you need to ensure interpretability, lineage of reusable data assets and uphold the bar on governance, performance measurement, and responsible AI
Data Pipeline Development
- Build the semantic layer that enables contextualized and explainable AI-driven workflows, including ontology development, entity models and knowledge graphs
- Build robust data pipelines to ingest, transform and prepare structured and unstructured datasets from diverse internal and external sources e. g. CRM platforms (e. g. , Veeva) and field force deployment or alignment tools, HCP engagement data, digital metrics, and campaign data sources.
- Ensure data is clean, normalized, and optimized for downstream AI/ML and analytics use.
Infrastructure & Architecture
- Develop and manage data infrastructure using cloud platforms (e. g. , AWS, Azure, GCP).
- Implement data lake, data warehouse, and real-time streaming architecture.
- Support containerization and orchestration using data management tools
- Enable real-time and batch data access for AI agents, LLM-based applications and analytical products
Data Quality & Governance
- Implement data validation, profiling, and monitoring processes to ensure accuracy and reliability.
- Collaborate with compliance teams to ensure data handling aligns with HIPAA, FDA, and other U. S. healthcare regulations.
- Maintain metadata, lineage, and audit trails for all data assets.
Collaboration & Cross-Functional Support
- Collaborate with data scientists, ML engineers and product managers to optimize data for use in RAG, autonomous AI agents and retrieval pipelines
- Support rapid prototyping and iterative development of AI solutions.
Performance Optimization
- Tune data workflows for performance, scalability, and cost-efficiency.
- Implement caching, indexing, and partitioning strategies to support high-volume data processing.
- Monitor system health and troubleshoot bottlenecks in data pipelines.
BASIC QUALIFICATIONS
- Bachelor s or Master s degree in Computer Science, Engineering, or related field.
- Upto 4 years of experience in data engineering, preferably in healthcare or life sciences.
- Proficiency in SQL, Python, and data pipeline frameworks (e. g. , Apache Airflow, Spark, Kafka).
- Experience with cloud data platforms (e. g. , AWS Redshift, Azure Synapse, Google BigQuery).
- Familiarity with pipeline orchestration (e. g. Airflow, DBT, Prefect, Dagster)
- Excellent problem-solving, communication, and collaboration skills.
- Extensive experience working in agile setting or bring agile best-practice mentorship to the team.
- Familiarity with data privacy standards, pharma industry practices/GDPR compliance is preferred.
- Prioritizes excellence in Data Engineering by following F. A. I. R. principles and adhering to engineering and documentation standards set for by the organization.
Information & Business Tech
Job Classification
Industry: Pharmaceutical & Life Sciences
Functional Area / Department: Data Science & Analytics
Role Category: Data Science & Machine Learning
Role: Data Engineer
Employement Type: Full time
Contact Details:
Company: Pfizer
Location(s): Mumbai
Keyskills:
Data management
Agile
Healthcare
Apache
Analytics
Monitoring
SQL
CRM
Python
Auditing