Talent.com
Evaluation Scenario Writer - AI Agent Testing Specialist
Evaluation Scenario Writer - AI Agent Testing SpecialistMindrift • CL
Evaluation Scenario Writer - AI Agent Testing Specialist

Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift • CL
Hace más de 30 días
Tipo de contrato
  • Quick Apply
Descripción del trabajo

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of English.

At Mindrift , innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.

What we do

The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.

About the Role

We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Although every project is unique, you might typically :

  • Create structured test cases that simulate complex human workflows.
  • Define gold-standard behavior and scoring logic to evaluate agent actions.
  • Analyze agent logs, failure modes, and decision paths.
  • Work with code repositories and test frameworks to validate your scenarios.
  • Iterate on prompts, instructions, and test cases to improve clarity and difficulty.
  • Ensure that scenarios are production-ready, easy to run, and reusable.

How to get started

Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.

Requirements

  • Bachelor's and / or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
  • Background in QA, software testing, data analysis, or NLP annotation.
  • Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).
  • Strong written communication skills in English.
  • Comfortable with structured formats like JSON / YAML for scenario description.
  • Can define expected agent behaviors (gold paths) and scoring logic.
  • Basic experience with Python and JS.
  • Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.
  • Nice to Have

  • Experience in writing manual or automated test cases.
  • Familiarity with LLM capabilities and typical failure modes.
  • Understanding of scoring metrics (precision, recall, coverage, reward functions).
  • Benefits

    Contribute on your own schedule, from anywhere in the world. This opportunity allows you to :

  • Get paid for your expertise, with  rates that can go up to $40 / hour  depending on your skills, experience, and project needs.
  • Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments.
  • Participate in an advanced AI project and gain valuable experience to enhance your portfolio.
  • Influence how future AI models understand and communicate in your field of expertise.
  • Crear una alerta de empleo para esta búsqueda

    Writer Agent Testing • CL

    Ofertas similares
    AI Engineer - LATAM

    AI Engineer - LATAM

    Space Inch • CL
    Quick Apply
    Space Inch is on a new mission, and we’re looking for AI experts and enthusiasts to join us!.LLM-powered services end-to-end. You will build and operate fast, reliable AI systems that power core pla...Mostrar más
    Última actualización: hace 2 días
    AI Pilot Vibe Coding Assistant

    AI Pilot Vibe Coding Assistant

    Mindrift • CL
    Quick Apply
    This opportunity is only for candidates currently residing in the specified country.Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of En...Mostrar más
    Última actualización: hace 15 días
    Intake Analyst

    Intake Analyst

    World Business Lenders, LLC • CL
    Quick Apply
    About World Business Lenders (.World Business Lenders (WBL) provides general purpose short-term real estate collateralized commercial loans to a broad customer base comprised of small and medium si...Mostrar más
    Última actualización: hace más de 30 días
    Detective e Investigador Privado en Maule

    Detective e Investigador Privado en Maule

    Cronoshare.cl • Maule (Región del Maule), cl
    Cronoshare es una plataforma online para profesionales que quieren encontrar nuevos clientes.Buscamos Detective e Investigador Privado en Maule y alrededores. Pertenecer a la red de profesionales de...Mostrar más
    Última actualización: hace 20 días • Oferta promocionada
    Abogado / a en Villa Alegre

    Abogado / a en Villa Alegre

    Cronoshare.cl • Villa Alegre (Región del Maule), cl
    Cronoshare es una plataforma online para profesionales que quieren encontrar nuevos clientes.Buscamos Abogado / a en Villa Alegre y alrededores. Pertenecer a la red de profesionales de Cronoshare no t...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Video Editor Astronomy - Remote, UK Based Media Company

    Video Editor Astronomy - Remote, UK Based Media Company

    Anypear • CL
    Quick Apply
    Anypear is a headhunting and recruitment agency connecting New Zealand and Australian businesses with top overseas talent. We are currently on the lookout for a talented.UK-based media company in cr...Mostrar más
    Última actualización: hace más de 30 días
    SEM Specialist (In Chile)

    SEM Specialist (In Chile)

    Busbud • CL
    Quick Apply
    About us : At Recorrido, we are proud because we are the largest OTA for ground travel in Chile!.That's not all : We are expanding in LATAM and have revolutionized the intercity bus industry, making ...Mostrar más
    Última actualización: hace más de 30 días
    Senior Software Engineer (Multiple Stacks) - Remote Contract

    Senior Software Engineer (Multiple Stacks) - Remote Contract

    Salve.Inno Consulting • Chile, Chile
    We are hiring experienced Senior Software Engineers (3-9 years) for ongoing, short-term remote projects with a global AI-driven platform. This initiative supports a leading technology client's LLM E...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    AI Pilot Operations Associate (Freelance)

    AI Pilot Operations Associate (Freelance)

    Mindrift • CL
    Quick Apply
    This opportunity is only for candidates currently residing in the specified country.Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of En...Mostrar más
    Última actualización: hace 1 día
    Remote - Valuation analyst

    Remote - Valuation analyst

    World Business Lenders, LLC • CL
    Quick Apply
    About World Business Lenders (.World Business Lenders (WBL) provides general purpose short-term real estate collateralized commercial loans to a broad customer base comprised of small and medium si...Mostrar más
    Última actualización: hace más de 30 días
    Freelance Civil Engineering Expert with Python Experience - AI Trainer

    Freelance Civil Engineering Expert with Python Experience - AI Trainer

    Mindrift • CL
    Quick Apply
    This opportunity is only for candidates currently residing in the specified country.Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of En...Mostrar más
    Última actualización: hace más de 30 días
    Senior Software Engineer (Multiple Stacks) - RemoteContract

    Senior Software Engineer (Multiple Stacks) - RemoteContract

    Salve.Inno Consulting • Chile, Chile
    We are hiring experienced Senior Software Engineers (3-9 years) for ongoing, short-term remote projects with a global AI-driven platform. This initiative supports a leading technology client's LLM E...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Animador 2D, 3D o diseñador de Motion Graphics en Talca

    Animador 2D, 3D o diseñador de Motion Graphics en Talca

    Cronoshare.cl • Talca (Región del Maule), cl
    Cronoshare es una plataforma online para profesionales que quieren encontrar nuevos clientes.Buscamos Animador 2D, 3D o diseñador de Motion Graphics en Talca y alrededores.Pertenecer a la red de pr...Mostrar más
    Última actualización: hace más de 30 días • Oferta promocionada
    Economic Evaluation & Benchmarking Analyst, Intermediate

    Economic Evaluation & Benchmarking Analyst, Intermediate

    Ausenco • CL
    Ausenco is a fast-growing company with big ideas.We redefine what’s possible in some of the world’s most complex projects and toughest environments. Delivering innovative, value-add consulting, proj...Mostrar más
    Última actualización: hace más de 30 días
    SEO Content Specialist | Remote | LATAM Only | 83110

    SEO Content Specialist | Remote | LATAM Only | 83110

    Remote Talent LATAM • CL
    Quick Apply
    Latin American talent with leading U.We guide businesses and candidates through every step of the hiring process, ensuring the perfect match in skills, culture, and goals.While we’re not direct emp...Mostrar más
    Última actualización: hace 29 días
    Data Engineer - Latin America - Remote

    Data Engineer - Latin America - Remote

    Azumo • CL
    Quick Apply
    Azumo is currently looking for a highly motivated Big Data Engineer to develop and enhance data and analytics infrastructure. FULLY REMOTE based in Latin America.This position will give you the oppo...Mostrar más
    Última actualización: hace más de 30 días
    Data & Reporting Specialist

    Data & Reporting Specialist

    Workana • CL
    Quick Apply
    Workana is the largest remote work platform for talents in Latin America.Our new segment, Workana Premium, focuses on matching the most exceptional professionals with leading and innovative compani...Mostrar más
    Última actualización: hace 2 días
    Freelance Software Developer (Kotlin) - Quality Assurance (AI Trainer)

    Freelance Software Developer (Kotlin) - Quality Assurance (AI Trainer)

    Mindrift • CL
    Quick Apply
    This opportunity is only for candidates currently residing in the specified country.Your location may affect eligibility and rates. Please submit your resume in English and indicate your level of En...Mostrar más
    Última actualización: hace más de 30 días