Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Data Engineer (Senior Architect) @ Tanisha Systems

Home > Data Science & Machine Learning






Tanisha Systems  Data Engineer (Senior Architect)

Job Description

Location

Remote (India)

Experience

15+ years in data engineering, 5+ years in data architecture for ML/AI

Engagement

Full-Time, Permanent


ABOUT THE ROLE

The TDD's recommended architecture implements a Hybrid Graph ML + Graph RAG knowledge representation layer the most architecturally complex layer in the system. This requires a senior data architect who designs the unified context layer combining GNN-produced features with vector-retrieval-augmented context, architects the golden dataset schema with versioning and pool separation, designs the data migration strategy for integrating with the ~10B-record knowledge graph, and validates Graph RAG retrieval quality. The 2.00 man-month allocation reflects a focused, high-impact engagement during the foundational phases.


Project Context: You will architect the data layer for the Tradecraft Evaluation Platform, designing the Hybrid Graph ML + Graph RAG knowledge representation system that captures both structural patterns (entity embeddings, anomaly scores) and procedural tradecraft (reasoning chains, confidence rubrics) from a 10-billion-record knowledge graph. You will define the golden dataset schema, data pool separation strategy, and data migration approach that all downstream evaluation work depends on.


KEY RESPONSIBILITIES

A. STANDARD RESPONSIBILITIES:

  • Design data architectures that balance performance, scalability, and maintainability for complex analytical workloads
  • Define data modeling standards, schema versioning strategies, and data quality frameworks
  • Evaluate and select data storage technologies based on workload characteristics and access patterns
  • Review and approve data pipeline designs produced by other engineers

B. PROJECT-SPECIFIC RESPONSIBILITIES:

  • Architect the Hybrid Graph ML + Graph RAG data layer, defining how GNN-produced entity embeddings (from PyTorch Geometric) and vector-retrieval results (from Weaviate) converge in the unified context layer
  • Design the golden dataset schema in PostgreSQL with versioning, pool separation (eval/training/holdout using physically separate schemas), and full lineage tracking from source artifact through extraction to validation
  • Design the data migration strategy for integrating with the knowledge graph (~10B records, ~700 sources) via read-only API access, defining subgraph extraction patterns for Chinese corporate network investigation scenarios
  • Architect the vector database (Weaviate) schema for tradecraft corpus indexing, defining chunking strategies, embedding models, metadata schemas, and retrieval/re-ranking pipelines
  • Define the Graph ML data pipeline for the lightweight GNN training in Phase 4, specifying feature engineering for node/edge attributes from corporate registration and trade data
  • Validate Graph RAG retrieval quality by designing retrieval precision/recall benchmarks against expert-curated test queries

REQUIRED SKILLS & EXPERIENCE

  • [STANDARD] 15+ years of experience in data engineering/architecture with at least 5 years designing data platforms for ML/AI workloads
  • [STANDARD] Expert-level proficiency in PostgreSQL, including advanced features (partitioning, triggers, materialized views, row-level security)
  • [PROJECT-SPECIFIC] Hands-on experience designing and deploying vector database systems (Weaviate, Pinecone, Qdrant, or Milvus) for RAG pipelines at production scale
  • [PROJECT-SPECIFIC] Experience with graph data models and graph databases, including query optimization for large-scale knowledge graphs
  • [PROJECT-SPECIFIC] Experience designing data architectures with physical data separation for compliance (separate schemas, separate storage buckets, infrastructure-level access controls)
  • [STANDARD] Expert-level Python proficiency for data pipeline development
  • [STANDARD] Experience with cloud-managed data services (AWS RDS, S3, or equivalents on Azure/GCP)

Experience Requirements

  • YEARS OF EXPERIENCE: 15+ years in data engineering, 5+ years in data architecture for ML/AI
  • SENIORITY LEVEL: Staff / Principal
  • TYPICAL BACKGROUND: Senior data architect at an AI platform company; principal data engineer at a government analytics firm; data platform lead at a knowledge graph company; chief data architect at a risk/compliance technology company
  • COMPLEXITY INDICATORS: Has designed data architectures integrating 3+ heterogeneous data stores (relational, graph, vector); has worked with datasets at a billion-record scale; has designed data separation architectures for compliance
  • LEADERSHIP / OWNERSHIP EXPECTATIONS: Owns all data architecture decisions; reviews and approves data pipeline designs from IC3 Data Engineer; presents data architecture to client's Principal Architect (James) and SVP Engineering (Phillip)
  • SUCCESS INDICATORS:
  • Has designed and deployed a production vector database for RAG with measurable retrieval precision >70%
  • Has architected a data platform integrating graph database features with vector retrieval for LLM augmentation
  • Has designed golden dataset or evaluation dataset management systems with versioning and lineage
  • Has implemented physical data separation for compliance in a regulated environment

  • RED FLAGS:
  • Vector database experience limited to tutorials or proof-of-concepts; no production deployment
  • No experience with graph data models; treats all data as relational tables
  • Cannot articulate chunking strategy trade-offs for RAG systems (semantic vs. fixed-size vs. structural)

Project-Specific Skills and Domain Knowledge

Must-Have:

  • Experience designing vector database schemas for RAG systems, including chunking strategy selection (semantic vs. structural), embedding model selection, and metadata-filtered retrieval
  • Experience with PyTorch Geometric or DGL for graph neural network feature engineering and data pipeline design
  • Experience designing data versioning systems for ML evaluation datasets with full lineage tracking
  • Experience with TimescaleDB or equivalent time-series extensions for metrics and cost tracking data

PREFERRED QUALIFICATIONS

  • Experience architecting data layers for government or FedRAMP-compatible systems
  • Experience with append-only data stores for immutable audit logging (Amazon QLDB or equivalent)
  • AWS Data Analytics Specialty or equivalent certification
  • Experience with Weaviate specifically (self-hosted on Kubernetes)
  • Prior work with trade data (bills of lading) or corporate registration data schemas

Project-Specific Skills and Domain Knowledge

Strongly Preferred:

  • Experience with knowledge graph systems at billion-record scale
  • Experience with entity resolution data models (strong/weak identifier classification)
  • Familiarity with HELM benchmark data formats and evaluation dataset structures

Trade-Craft Experience A Significant Plus

Candidates with backgrounds in intelligence analysis, signals intelligence, law enforcement data fusion, or related trade-craft disciplines are strongly encouraged to apply. Understanding of link analysis, entity disambiguation under adversarial conditions, handling classified or compartmentalised data, and mission-driven product constraints will set you apart.


Job Classification

Industry: Internet
Functional Area / Department: Data Science & Analytics
Role Category: Data Science & Machine Learning
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Tanisha Systems
Location(s): Bengaluru

+ View Contactajax loader


Keyskills:   Vector Database Chunking & Retrieval Pipelines Graph RAG Artificial Intelligence Graph Databases Semantic/Hybrid Machine Learning

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Senior Data Scientist

  • Cognizant
  • 8 - 13 years
  • Hyderabad
  • 3 days ago
₹ Not Disclosed

Walk-in || Face-to-face interview For Data Scientist role - Chennai on...

  • EY
  • 5 - 10 years
  • Chennai
  • 3 days ago
₹ Not Disclosed

Lead - Data Engineer

  • Iris Software
  • 8 - 10 years
  • Noida, Gurugram
  • 3 days ago
₹ Not Disclosed

Senior Machine Learning Engineer

  • Calsoft
  • 5 - 8 years
  • Bengaluru
  • 3 days ago
₹ 12-22 Lacs P.A.

Tanisha Systems

Tanisha Systems, founded in 2002 in Massachusetts-USA, is a leading provider of Custom Application Development and end-to-end IT Services to clients globally. We use a client-centric engagement model that combines local on-site and off-site resources with the cost, global expertise and quality adv...