we're looking for an experienced Platform Operations Engineer to lead the administration and optimization of our Kafka-based data streaming infrastructure on AWS. This role focuses on ensuring reliability, scalability, and performance across our Amazon MSK clusters and supporting cloud services. you'll work closely with engineering and DevOps teams to maintain a secure, high-throughput event streaming environment that underpins key business applications and analytics platforms.
Must-Have Qualities
10+ years of hands-on experience managing and administering Apache Kafka and/or Amazon Managed Streaming for Apache Kafka (MSK).
10+ years of professional experience with AWS cloud services, including EC2, EKS, IAM, CloudWatch, CloudTrail, S3, and VPC networking.
Proven expertise in Kafka cluster setup, scaling, partitioning, replication, and monitoring.
Strong proficiency with infrastructure-as-code tools especially Terraform.
Solid understanding of Kafka security and data governance in cloud environments.
Experience with DevOps, CI/CD pipelines and automation scripting using Python, Bash, or similar languages.
Skilled in troubleshooting distributed systems, caching architectures, and optimizing message throughput and latency.
Effective communicator with experience collaborating across Cloud, DevOps, Security, and Data engineering teams.
Nice-to-Have Qualities
Familiarity with Kubernetes/EKS and containerized microservices deployments.
Knowledge of Kafka Connect, Schema Registry, and Kafka Streams.
Exposure to monitoring and alerting frameworks (e.g., OTEL, Prometheus, Grafana).
Experience with cross-region or multi-account AWS setups.
AWS certifications such as AWS Certified Solutions Architect - Professional or DevOps Engineer - Professional.
Preferred Experience
10+ years managing enterprise-scale Kafka environments and AWS cloud infrastructure.
Prior work in large-scale data platforms or high-availability, low-latency systems.
Experience in capital markets companies.
Background in performance tuning, cost optimization, and incident response for streaming platforms.
Preferred Qualifications AWS Certification (e.g., Solutions Architect Associate/Professional, DevOps Engineer) Familiarity with data pipelines and ETL tools.
Job Classification
Industry: BankingFunctional Area / Department: Engineering - Software & QARole Category: DevOpsRole: Site Reliability EngineerEmployement Type: Full time