Job Description
As an SDET driving the creation of next-generation LLM-based evaluation systems for Alexa+, you ll design and build the frameworks that define how conversational intelligence is measured determining whether millions of daily interactions feel accurate, natural, and human-centered. Your work goes far beyond binary pass/fail tests: you ll engineer automated systems that assess accuracy, reasoning depth, tone, and responsiveness across multimodal, context-rich conversations. In this role, traditional testing boundaries dissolve. You ll evaluate not just functional correctness, but whether the AI s responses are contextually relevant, emotionally aligned, and conversationally fluent. Your systems will measure everything from factual accuracy and task completion to subtler attributes like dialog flow, personality coherence, and graceful curtailment. Partnering closely with scientists and engineers, you ll automate the detection of conversational regressions identifying hallucinations, degraded reasoning, or misaligned tones before they reach customers. You ll leverage prompt-driven evaluation pipelines, LLM-as-a-Judge (LLMaaJ) frameworks, and reference-based validation to ensure assessments remain consistent, explainable, and scalable across model versions and releases. You ll also collaborate with prompt engineers, model developers, and product teams to establish robust, category-specific testing methodologies from quick one-shot actions and task fulfillment to multi-turn dialogues and creative, open-ended interactions. Through your work, evaluation evolves from a quality gate into a self-improving assessment frameworks one that learns, adapts, and ensures every voice interaction feels naturally conversational. Bachelors degree or above in computer science, computer engineering, or related field, or Associates degree or above 4+ years of experience as an SDET, Software Engineer, or QA Automation Engineer
Proficiency in at least one programming language (e.g., Java, Python, or JavaScript)
Hands-on experience building and maintaining automation frameworks for UI, API, and backend systems
Strong understanding of API testing, data-driven validation, and test metrics Excellent problem-solving skills and ability to dive deep into complex systems to identify root causes
Strong communication and collaboration skills, with a passion for driving quality across teams Knowledge of overall system architecture, scalability, reliability, and performance in a database environment
Experience with security in service-oriented architectures and web services
Masters degree in Computer Science or a related field Experience with AWS or cloud technologies
Experience with LLM-based evaluation, prompt-driven testing, or LLM-as-a-Judge frameworks
Hands-on experience using AI and LLM-based frameworks to automate quality testing and build intelligent evaluation systems.
Ability to design semantic and behavioral validation beyond functional correctness
Proven record of collaborating with scientists and engineers to define measurable AI quality metrics
Job Classification
Industry: Internet
Functional Area / Department: Engineering - Software & QA
Role Category: Quality Assurance and Testing
Role: Software Developer in Test (SDET)
Employement Type: Full time
Contact Details:
Company: Amazon
Location(s): Pune
Keyskills:
Computer science
System architecture
Backend
Web services
Quality testing
API Testing
Javascript
Scheduling
QA automation
Python