Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Lead Machine Learning Engineer @ Tredence

Home > Data Science & Machine Learning

 Lead Machine Learning Engineer

Job Description

We are seeking a Lead Machine Learning Engineer with deep expertise in MLOps and LLMOps to architect, build, and scale production-grade AI/ML systems. This role combines hands-on technical ownership with technical leadership and mentorship, driving best practices across ML lifecycle management and large-scale LLM deployments.

You will lead the design of robust ML platforms, optimize LLM performance and cost, and work closely with data science, platform, and product teams to deliver reliable AI solutions at scale.

Key Responsibilities

Technical Leadership & Architecture

  • Architect and implement end-to-end MLOps and LLMOps pipelines covering training, validation, deployment, monitoring, and automated retraining.
  • Define engineering standards, best practices, and reference architectures for ML and LLM systems in production.
  • Review designs and code, providing technical guidance and mentorship to ML engineers and data scientists.

Hands-on (IC Ownership)

  • Design and deploy scalable, fault-tolerant ML and LLM services for real-time and batch use cases.
  • Fine-tune, optimize, and serve large language models (LLMs) with a focus on performance, latency, and cost efficiency.
  • Implement CI/CD pipelines for ML models, ensuring reproducibility, versioning, and automated rollout.
  • Monitor model performance, data quality, and drift; build automated feedback and retraining loops.
  • Optimize inference using quantization, pruning, distillation, LoRA/PEFT, and efficient serving strategies.

Platform & Cloud Engineering

  • Build and operate ML platforms using AWS, GCP, or Azure, leveraging managed ML services where appropriate.
  • Containerize and orchestrate ML workloads using Docker and Kubernetes.
  • Implement feature stores, model registries, and experiment tracking for scalable collaboration.

Collaboration & Governance

  • Partner with product, data, and engineering teams to translate business problems into ML solutions.
  • Ensure ML systems meet security, privacy, compliance, and ethical AI standards.
  • Contribute to roadmap planning and technical decision-making for AI initiatives.

Required Qualifications

Technical Skills

  • 811 years of experience in software engineering and machine learning, with significant production ML ownership.
  • Expert-level Python programming skills.
  • Strong hands-on experience with ML frameworks such as PyTorch, TensorFlow, Hugging Face, JAX.
  • Proven experience with MLOps/LLMOps tools: MLflow, Kubeflow, Vertex AI, SageMaker, Airflow.
  • Deep understanding of LLM architectures, prompt engineering, and fine-tuning workflows.
  • Solid experience with Docker, Kubernetes, and cloud-native ML deployments.
  • Experience with model monitoring and observability (Prometheus, Grafana, Evidently AI).
  • Strong background in distributed computing (Spark, Ray, Dask).
  • Experience with data pipelines and feature stores (Kafka, Apache Beam, Feast, Tecton).

Leadership & Soft Skills

  • Demonstrated experience leading or mentoring ML engineers while remaining hands-on.
  • Strong problem-solving, debugging, and system-level thinking skills.
  • Ability to clearly communicate complex ML concepts to technical and non-technical stakeholders.
  • Proactive mindset with a passion for learning and adopting emerging ML and LLM technologies.

Preferred Qualifications

  • Hands-on experience with LLM fine-tuning, RLHF, LoRA, and PEFT techniques.
  • Experience building RAG pipelines using vector databases such as FAISS, Pinecone, Weaviate.
  • Familiarity with LangChain, LlamaIndex, or similar LLM orchestration frameworks.
  • Production experience deploying and operating LLMs such as LLaMA, Mistral, Falcon, Claude, or GPT-based models.
  • Experience designing cost-efficient, high-availability LLM inference architectures.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Data Science & Analytics
Role Category: Data Science & Machine Learning
Role: Manager - Machine Learning
Employement Type: Full time

Contact Details:

Company: Tredence
Location(s): Kolkata

+ View Contactajax loader


Keyskills:   Vertex GCP Large Language Model Deploying Models Machine Learning Python

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

AI/Agentic Engineer with FinOps

  • Cirruslabs
  • 3 - 8 years
  • Hyderabad
  • 2 hours ago
₹ Not Disclosed

Lead Data Scientist

  • S&P Global Market
  • 6 - 11 years
  • Hyderabad
  • 5 days ago
₹ Not Disclosed

Data Engineer II, International Seller Growth

  • Amazon
  • 3 - 8 years
  • Hyderabad
  • 5 days ago
₹ Not Disclosed

Data Engineer

  • Einfochips
  • 4 - 6 years
  • Pune
  • 5 days ago
₹ Not Disclosed

Tredence

Founded in 2000, Chetu is a global leader in providing tailored software development solutions and support services. Chetu\\\\\\\'s dedicated team of technology professionals offers an extensive array of software solutions, including custom application development, enterprise software integration, m...