Vector Database & Embedding Engineer RAG Pipeline Development @ Tenth Planet

Home > Software Development

Vector Database & Embedding Engineer RAG Pipeline Development

Tenth Planet
3 - 8 years
Chennai
1 month ago
Email to a friend
Report this job

Job Description

Job Summary

We are seeking an experienced Vector Database & Embedding Engineer to design, build, and optimize vector search pipelines, embedding workflows, and chunking strategies for enterprise Retrieval-Augmented Generation (RAG) systems.

This role requires deep hands-on experience with vector DBs (pgvector, Pinecone, Chroma, Milvus, Weaviate), embedding models (OpenAI, HuggingFace, Instructor, FlagEmbedding, BGE, etc.), and robust chunking/indexing pipelines for structured/unstructured data.

You will collaborate with LLM engineers, graph engineers, backend teams, and product owners to deliver high-accuracy, high-recall retrieval systems for AI applications.

Key Responsibilities

1. Vector Database Design & Management

Setup, configure and manage vector DBs such as:

pgvector, FAISS, Pinecone, Weaviate, Chroma, Milvus

Design schemas for:

Multi-embedding storage
Metadata storage
Document-level and chunk-level indexing

Implement filtering, similarity search, MMR, reranking, and index optimization.

2. Embedding Pipeline Development

Select, fine-tune, or run embedding models such as:

Sentence-BERT, BGE, GTE, Instructor, FlagEmbedding
OpenAI Embeddings / Azure OpenAI
HuggingFace Transformers

Build:

Batch embedding pipelines
Real-time embedding APIs
Multi-encoder architecture for hybrid search

Evaluate embedding quality, dimensionality, and vector drift.

3. Chunking, Indexing & Document Processing

Design advanced chunking strategies:

Fixed window chunking
Sliding window
Semantic chunking
Layout-aware chunking (tables, lists, multi-column)

Extract content from:

PDFs, HTML pages, Office files, emails, scanned docs

Build a complete indexing pipeline:

Preprocessing Chunking Embedding Vector DB upsert Metadata linking

4. RAG Optimization & Retrieval Tuning

Optimize retrieval for:

Accuracy
Latency
Recall / diversity

Implement hybrid search:

Vector + Keyword
Vector + Graph (GraphRAG)

Build ranking stacks using rerankers (Cross-Encoders).

5. Backend & API Development

Build APIs for:

Document ingestion
Embedding generation
Retrieval & context merging

Serve embedding + vector workflows using Python/FastAPI or Node.js.
Integrate vector search with LLM prompt templates.

6. Monitoring, Evaluation & Scaling

Evaluate retrieval metrics (pr******n@*, re***l@*, MRR).
Implement observability for indexing, failures, and accuracy degradation.
Scale vector DBs horizontally & vertically based on dataset size.

7. Collaboration & Documentation

Work with LLM engineers to design end-to-end RAG pipelines.
Maintain documentation for:

Embedding configs
Chunking logic
Vector schemas
Retrieval settings

Train internal teams on best practices.

Required Technical Skills

Vector Databases

Strong hands-on with:

pgvector (must-have for enterprise)
Pinecone, Chroma, Weaviate, Milvus, or FAISS

Deep knowledge of:

Index types (HNSW, IVFFlat, PQ, IVF-PQ)
Similarity metrics (cosine, dot, euclidean)
Index tuning (ef_search, ef_construction, cluster size)

Embeddings

Experience generating and evaluating embeddings using:

OpenAI / Azure OpenAI
InstructorXL, BGE, GTE, FlagEmbedding
Sentence-BERT / HF embeddings

Knowledge of:

Embedding dimensionality
Tokenization & vector normalization
Multi-embedding pipelines

Chunking & Preprocessing

Strong experience with document processing libraries:

PDFPlumber, PyMuPDF, Textract, Tika

Designing chunking strategies for:

PDFs
Web pages
Product catalogs
Emails & logs

Metadata creation and linking strategies.

Backend / Engineering

Python (preferred), Node.js
FastAPI / Flask
SQL & NoSQL
ETL pipelines (Airflow / custom)
Docker, Linux environments

Experience Required

Total Experience: 26 years
Relevant Vector Search / Embedding Experience: 13 years
Experience in building real RAG systems (highly preferred).

Preferred Skills

Knowledge of:

LangChain or LlamaIndex
Rerankers (Cross-Encoders)
Hybrid retrieval
Graph + Vector hybrid search

Experience in:

OCR processing
Data extraction
Enterprise search systems

Familiarity with:

RedisSearch
ElasticSearch vector search

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Tenth Planet
Location(s): Chennai

+ View Contact

Login

Candidates can login here to view contacts and apply.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach Resume Max 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Candidates are expected to provide most recent and accurate profile information, inappropriate content is strictly prohibited!

Keyskills: embedding Retrieval Augmented Generation Vector

Fraud Alert to job seekers!

₹ Not Disclosed

Job application

We will notify the employer with your details. You can also attach a resume or a cover letter.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach ResumeMax 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Similar positions

System Development Engineer, NA GC

Amazon

0 - 7 years

Bengaluru

2 days ago

₹ Not Disclosed

Software Engineer - Python

Freestone Infotech

0 years

Mumbai

2 days ago

₹ Not Disclosed

Gen AI Technical Lead-App Development

Birlasoft

9 - 13 years

Pune

2 days ago

₹ Not Disclosed

Gen AI Technical Lead-App Development

Birlasoft

9 - 13 years

Pune

2 days ago

₹ Not Disclosed

Tenth Planet

Uplift your Business with Open source TenthPlanet provides customized enterprise IT solutions, centered around open source software, cost effectively for your business.

Vector Database & Embedding Engineer RAG Pipeline Development @ Tenth Planet

Home > Software Development