We are looking for an experienced cloud development engineer to work on our HPC -CSM Manageability solution.
Role involves designing, implementing, and maintaining our HPC CSM manageability platform hosted on kubernetes infrastructure. The position requires in-depth expertise in cloud native technologies, particularly Kubernetes, along with a strong background in , automation, and DevOps practices.
Good understanding of security on Cloud Native applications is expected.
Test Planning & Execution
Design, implement, and execute comprehensive test plans for the CSM platform, including functional, regression, integration, and performance testing.
Validate HPC system management capabilities such as node provisioning, monitoring, workload orchestration, and system upgrades.
Automation Development
Develop automated test suites using Python, Bash, and CI/CD frameworks to ensure rapid and repeatable test execution.
Integrate automated testing into the development pipeline to support continuous delivery.
Defect Tracking & Reporting
Identify, document, and track defects; work with engineering teams to resolve issues.
Provide clear, reproducible test cases and logs to aid in troubleshooting.
Performance & Scalability Validation
Perform stress testing and scale testing on large HPC clusters.
Monitor and analyze system metrics to assess stability under load.
What you need to bring:
Education and Experience Required:
Bachelors degree preferred or Associate degree holder (technical field) with 8-12 years working experience in related fields desired.
Technical Skills
Strong understanding of Linux (RHEL, SLES, Ubuntu) system administration.
Experience with Kubernetes, containers (Docker/Podman), and networking fundamentals.
Proficiency in scripting languages (Python, Bash) for automation.
Familiarity with HPC architectures, job schedulers (Slurm, PBS Pro), and workload management concepts.
Testing Expertise
Experience with test automation frameworks (e.g., pytest, Robot Framework, Jenkins CI/CD).
Hands-on experience in system-level testing, API testing, and performance validation.
Tools & Platforms
Familiarity with Git, Jira, Confluence, and defect tracking workflows.
Experience with monitoring and log analysis tools (Grafana, Prometheus, ELK stack) is a plus.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: DevOps Consultant / ArchitectEmployement Type: Full time