Data Engineer (Senior Architect) @ Tanisha Systems

Home > Data Science & Machine Learning

Data Engineer (Senior Architect)

Tanisha Systems
14 - 22 years
Bengaluru
2 days ago
Email to a friend
Report this job

Job Description

Location

Remote (India)

Experience

15+ years in data engineering, 5+ years in data architecture for ML/AI

Engagement

Full-Time, Permanent

ABOUT THE ROLE

The TDD's recommended architecture implements a Hybrid Graph ML + Graph RAG knowledge representation layer the most architecturally complex layer in the system. This requires a senior data architect who designs the unified context layer combining GNN-produced features with vector-retrieval-augmented context, architects the golden dataset schema with versioning and pool separation, designs the data migration strategy for integrating with the ~10B-record knowledge graph, and validates Graph RAG retrieval quality. The 2.00 man-month allocation reflects a focused, high-impact engagement during the foundational phases.

Project Context: You will architect the data layer for the Tradecraft Evaluation Platform, designing the Hybrid Graph ML + Graph RAG knowledge representation system that captures both structural patterns (entity embeddings, anomaly scores) and procedural tradecraft (reasoning chains, confidence rubrics) from a 10-billion-record knowledge graph. You will define the golden dataset schema, data pool separation strategy, and data migration approach that all downstream evaluation work depends on.

KEY RESPONSIBILITIES

A. STANDARD RESPONSIBILITIES:

Design data architectures that balance performance, scalability, and maintainability for complex analytical workloads
Define data modeling standards, schema versioning strategies, and data quality frameworks
Evaluate and select data storage technologies based on workload characteristics and access patterns
Review and approve data pipeline designs produced by other engineers

B. PROJECT-SPECIFIC RESPONSIBILITIES:

Architect the Hybrid Graph ML + Graph RAG data layer, defining how GNN-produced entity embeddings (from PyTorch Geometric) and vector-retrieval results (from Weaviate) converge in the unified context layer
Design the golden dataset schema in PostgreSQL with versioning, pool separation (eval/training/holdout using physically separate schemas), and full lineage tracking from source artifact through extraction to validation
Design the data migration strategy for integrating with the knowledge graph (~10B records, ~700 sources) via read-only API access, defining subgraph extraction patterns for Chinese corporate network investigation scenarios
Architect the vector database (Weaviate) schema for tradecraft corpus indexing, defining chunking strategies, embedding models, metadata schemas, and retrieval/re-ranking pipelines
Define the Graph ML data pipeline for the lightweight GNN training in Phase 4, specifying feature engineering for node/edge attributes from corporate registration and trade data
Validate Graph RAG retrieval quality by designing retrieval precision/recall benchmarks against expert-curated test queries

REQUIRED SKILLS & EXPERIENCE

[STANDARD] 15+ years of experience in data engineering/architecture with at least 5 years designing data platforms for ML/AI workloads
[STANDARD] Expert-level proficiency in PostgreSQL, including advanced features (partitioning, triggers, materialized views, row-level security)
[PROJECT-SPECIFIC] Hands-on experience designing and deploying vector database systems (Weaviate, Pinecone, Qdrant, or Milvus) for RAG pipelines at production scale
[PROJECT-SPECIFIC] Experience with graph data models and graph databases, including query optimization for large-scale knowledge graphs
[PROJECT-SPECIFIC] Experience designing data architectures with physical data separation for compliance (separate schemas, separate storage buckets, infrastructure-level access controls)
[STANDARD] Expert-level Python proficiency for data pipeline development
[STANDARD] Experience with cloud-managed data services (AWS RDS, S3, or equivalents on Azure/GCP)

Experience Requirements

YEARS OF EXPERIENCE: 15+ years in data engineering, 5+ years in data architecture for ML/AI
SENIORITY LEVEL: Staff / Principal
TYPICAL BACKGROUND: Senior data architect at an AI platform company; principal data engineer at a government analytics firm; data platform lead at a knowledge graph company; chief data architect at a risk/compliance technology company
COMPLEXITY INDICATORS: Has designed data architectures integrating 3+ heterogeneous data stores (relational, graph, vector); has worked with datasets at a billion-record scale; has designed data separation architectures for compliance
LEADERSHIP / OWNERSHIP EXPECTATIONS: Owns all data architecture decisions; reviews and approves data pipeline designs from IC3 Data Engineer; presents data architecture to client's Principal Architect (James) and SVP Engineering (Phillip)
SUCCESS INDICATORS:
Has designed and deployed a production vector database for RAG with measurable retrieval precision >70%
Has architected a data platform integrating graph database features with vector retrieval for LLM augmentation
Has designed golden dataset or evaluation dataset management systems with versioning and lineage
Has implemented physical data separation for compliance in a regulated environment

RED FLAGS:
Vector database experience limited to tutorials or proof-of-concepts; no production deployment
No experience with graph data models; treats all data as relational tables
Cannot articulate chunking strategy trade-offs for RAG systems (semantic vs. fixed-size vs. structural)

Project-Specific Skills and Domain Knowledge

Must-Have:

Experience designing vector database schemas for RAG systems, including chunking strategy selection (semantic vs. structural), embedding model selection, and metadata-filtered retrieval
Experience with PyTorch Geometric or DGL for graph neural network feature engineering and data pipeline design
Experience designing data versioning systems for ML evaluation datasets with full lineage tracking
Experience with TimescaleDB or equivalent time-series extensions for metrics and cost tracking data

PREFERRED QUALIFICATIONS

Experience architecting data layers for government or FedRAMP-compatible systems
Experience with append-only data stores for immutable audit logging (Amazon QLDB or equivalent)
AWS Data Analytics Specialty or equivalent certification
Experience with Weaviate specifically (self-hosted on Kubernetes)
Prior work with trade data (bills of lading) or corporate registration data schemas

Project-Specific Skills and Domain Knowledge

Strongly Preferred:

Experience with knowledge graph systems at billion-record scale
Experience with entity resolution data models (strong/weak identifier classification)
Familiarity with HELM benchmark data formats and evaluation dataset structures

Trade-Craft Experience A Significant Plus

Candidates with backgrounds in intelligence analysis, signals intelligence, law enforcement data fusion, or related trade-craft disciplines are strongly encouraged to apply. Understanding of link analysis, entity disambiguation under adversarial conditions, handling classified or compartmentalised data, and mission-driven product constraints will set you apart.

Job Classification

Industry: Internet
Functional Area / Department: Data Science & Analytics
Role Category: Data Science & Machine Learning
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Tanisha Systems
Location(s): Bengaluru

+ View Contact

Login

Candidates can login here to view contacts and apply.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach Resume Max 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Candidates are expected to provide most recent and accurate profile information, inappropriate content is strictly prohibited!

Keyskills: Vector Database Chunking & Retrieval Pipelines Graph RAG Artificial Intelligence Graph Databases Semantic/Hybrid Machine Learning

Fraud Alert to job seekers!

₹ Not Disclosed

Job application

We will notify the employer with your details. You can also attach a resume or a cover letter.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach ResumeMax 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Similar positions

Senior Data Scientist

Cognizant

8 - 13 years

Hyderabad

3 days ago

₹ Not Disclosed

Walk-in || Face-to-face interview For Data Scientist role - Chennai on...

EY

5 - 10 years

Chennai

3 days ago

₹ Not Disclosed

Lead - Data Engineer

Iris Software

8 - 10 years

Noida, Gurugram

3 days ago

₹ Not Disclosed

Senior Machine Learning Engineer

Calsoft

5 - 8 years

Bengaluru

3 days ago

₹ 12-22 Lacs P.A.

Tanisha Systems

Tanisha Systems, founded in 2002 in Massachusetts-USA, is a leading provider of Custom Application Development and end-to-end IT Services to clients globally. We use a client-centric engagement model that combines local on-site and off-site resources with the cost, global expertise and quality adv...

Data Engineer (Senior Architect) @ Tanisha Systems

Home > Data Science & Machine Learning