Job Description
Job Summary
We are looking for a Resiliency & Chaos Testing Engineer with 5+ years of hands-on experience in performance engineering, resiliency Testing, and chaos testing for enterprise-grade, cloud-native environments. The ideal candidate will be proficient with performance testing tools (LoadRunner, JMeter), chaos engineering tools (Chaos Monkey, Chaos Studio, Chaos Mesh), and observability platforms (Dynatrace, AppDynamics, Prometheus, Grafana).
The role focuses on identifying performance bottlenecks, improving application resiliency, conducting failure simulations, and supporting enterprise audit and recovery readiness.
Key Responsibilities
Performance Engineering & Observability
- Design, execute, and analyze performance tests using LoadRunner, JMeter, etc.
- Identify bottlenecks and provide tuning recommendations across apps, databases, and infrastructure.
- Utilize APM tools (Dynatrace, AppDynamics, New Relic) and open-source monitoring (Prometheus-Grafana, Azure Monitor).
Chaos Engineering & Resiliency Validation
- Execute chaos tests using tools like Chaos Studio, Chaos Monkey, and Gremlin to simulate pod failures, network latency, node crashes, and dependency outages.
- Perform CL0 validation, failover testing, MTTR/MTBF analysis, and support disaster recovery strategies.
- Ensure systems are auto-healing and can withstand production-grade fault scenarios.
Audit & Risk Compliance
- Address and remediate enterprise audit issues like IS - 17556.
- Support operational resiliency efforts (e.g., Travis K space), ensuring enterprise uptime, compliance, and observability readiness.
Cloud & Container Platforms
- Test application performance and resiliency in Docker, Kubernetes, and OpenShift environments.
- Work with cloud-native solutions, Helm chart deployments, rolling updates, and secure TLS/mTLS configurations for microservices.
CI/CD & Agile Collaboration
- Integrate chaos and performance tests into CI/CD pipelines.
- Collaborate with Agile/DevOps teams to define NFRs, performance KPIs, and system readiness within sprints.
- Participate in backlog grooming, system hardening, and environment stability assessments.
Required Skills & Qualifications
- 5+ years in performance testing, resiliency validation, or chaos engineering.
- Expertise in LoadRunner, JMeter, Prometheus, Grafana, Chaos Studio, and Chaos Monkey.
- Experience with Kubernetes, OpenShift, Docker, and monitoring tools like Azure Monitor and New Relic.
- Familiarity with messaging systems like Kafka, RabbitMQ, IBMMQ, and databases like MongoDB and MSSQL.
- Hands-on with mTLS/SSL configurations and Helm for container deployment.
- Strong collaboration, analytical, and documentation skills.
Preferred Skills
- Experience with languages like Java, Python, or Golang for scripting fault simulations or automation.
- Understanding of risk frameworks, performance profiling tools (MAT, Java VisualVM), and cloud security practices.
- Prior work in payment domains or regulated environments with SLAs and compliance constraints.
Job Classification
Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Quality Assurance and Testing
Role: Performance Testing Engineer
Employement Type: Full time
Contact Details:
Company: Clarium Tech
Location(s): Chennai
Keyskills:
Load Runner
Resilience Testing
Chaos Testing
Performance Testing
Monkey Testing