Your browser does not support javascript! Please enable it, otherwise web will not work for you.

DevOps Engineer - Observability Stack @ Yotta Infrastructure

Home > IT & Information Security - Other

Yotta Infrastructure  DevOps Engineer - Observability Stack

Job Description

Role & responsibilities

  • Maintain large scale HPC/AI clusters with monitoring, logging and alerting Manage Linux job/workload schedulers and orchestration tools.
  • Develop and maintain continuous integration and delivery pipelines
  • Develop tooling to automate deployment and management of large-scale infrastructure environments, to automate operational monitoring and alerting, and to enable self-service consumption of resources.
  • Deploy monitoring solutions for the servers, network and storage.
  • Perform troubleshooting bottom up from bare metal, operating system, software stack and application level.
  • Being a technical resource, develop, re-define and document standard methodologies to share with internal teams Support Research & Development activities and engage in POCs/POVs for future improvements.
  • Knowledge of HPC and AI solution technologies, including CPUs, GPUs, high-speed interconnects, and supporting software.
  • Extensive knowledge and hands-on experience with Kubernetes, including container orchestration for AI/ML workloads, resource scheduling, scaling, and integration with HPC environments.
  • Experience in managing and installing HPC clusters, including deployment, optimization, and troubleshooting.
  • Excellent knowledge of Linux systems (Redhat/CentOS and Ubuntu), including internals, ACLs, OS-level security protections, and common protocols like TCP, DHCP, DNS, etc.
  • Experience with multiple storage solutions, including Lustre, GPFS, ZFS, and XFS. Familiarity with newer and emerging storage technologies is a plus.
  • Proficiency in Python programming and bash scripting.
  • Comfortable with automation and configuration management tools, including Jenkins, Ansible, Puppet/Chef, etc.

Must Have Skils:

  • Knowledge of CI/CD pipelines for software deployment and automation.
  • Knowledge of Kubernetes, container related microservice technologies.
  • Experience with GPU-focused hardware/software (DGX, CUDA.)
  • Background with RDMA (InfiniBand or RoCE) fabrics.
  • K8s and Cloud Certifications would be bonus.

Qualification:

  • BS/MS/PhD or equivalent experience in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields.
  • At least 2 plus years of professional experience in networking fundamentals, TCP/IP stack, and data center architecture.

Interested candidates can share their updates resumes at sa******o@yo**a.com


Regards.

YOTTA

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: IT & Information Security
Role Category: IT & Information Security - Other
Role: IT & Information Security - Other
Employement Type: Full time

Contact Details:

Company: Yotta Infrastructure
Location(s): Mumbai

+ View Contactajax loader


Keyskills:   Gpu Cuda Devops Observability Jenkins Docker Ansible Gpu Programming Kubernetes

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Devops Engineer with Mulesoft

  • Buzzworks Business
  • 4 - 9 years
  • Bengaluru
  • 22 hours ago
₹ 10-20 Lacs P.A.

Procurement Senior Executive -Indirect Purchase

  • Idexcel
  • 3 - 8 years
  • Hyderabad
  • 3 days ago
₹ Not Disclosed

Automation Tester- Python

  • Idexcel
  • 5 - 8 years
  • Bengaluru
  • 4 days ago
₹ Not Disclosed

Ai Ml Engineer

  • Wipro
  • 7 - 12 years
  • Hyderabad
  • 5 days ago
₹ 15-25 Lacs P.A.

Yotta Infrastructure

IntouchCX