〔AIVIA〕 Evaluation Domains

stemaway · April 24, 2026, 5:07pm

AIVIA Evaluation Domains

WHAT’S LIVE, WHAT’S COMING, AND HOW DOMAINS ARE CHOSEN

01 Current Domains

AIVIA’s evaluation domains map to the major areas of technical work where reasoning-based evaluation matters most. Two domains are currently live, with five more following shortly.

Live now:

LLM & Agentic Systems. Building, deploying, and operating large language model applications and autonomous agent systems — RAG pipelines, guardrails, agent orchestration, tool use, and more.

AI-Assisted Development. Engineering velocity in the AI era — the judgment skills for doing the whole job with AI in the loop, across coding, design, operations, and tool evaluation.

Coming soon:

Applied & Domain AI. Building AI in regulated, high-stakes domains where success means audit trails, calibration, and failing safely.

Engineering Effectiveness. The multiplier for senior engineers — the non-code skills that decide whether a senior engineer drives outcomes or just writes tickets.

Media & Multimodal. Beyond text — building systems that generate, understand, and combine images, audio, and video at production scale.

MLOps & ML Platform. The production plumbing — the infrastructure and process that turns a model file into a reliable, observable system in production.

Search & Recommendation. Matching intent with inventory — the systems that connect users to the right content or product at scale, across retrieval, ranking, and operations.

Each domain contains multiple subdomains, and each subdomain contains multiple components. Evaluations are available at the component level, with scenario posts anchoring each evaluation. The full list of domains, subdomains, and components is visible as categories on STEM-Away.

02 How Domains Are Chosen

AIVIA does not add domains arbitrarily. Each domain is selected based on two criteria:

Market Demand

The domain covers roles that are actively being hired for. If companies are struggling to fill positions or finding that existing candidate pools don’t match their needs, that’s a signal the domain belongs on AIVIA.

Evaluation Gap

The domain covers roles where existing evaluation methods fall short. Resumes don’t capture the reasoning these roles require. Traditional coding assessments don’t cover the judgment calls — tradeoff reasoning, failure mode awareness, system-level thinking — that separate strong candidates from adequate ones. If there’s no rigorous, standardized way to evaluate the role today, AIVIA builds one.

A domain with high demand but adequate existing evaluation methods doesn’t need AIVIA. A domain with a clear evaluation gap but no hiring demand doesn’t justify the investment. The intersection — high demand, poor existing evaluation — is where AIVIA adds the most value.

LLM & Agentic Systems and AI-Assisted Development were the first two domains because they sit squarely at that intersection: massive hiring demand, fast-moving skill requirements, and evaluation methods that haven’t kept up.

How new domains are prioritized: Employer interest plays a direct role. When hiring teams on AIVIA signal demand for a specific domain — by requesting it, by describing roles that don’t fit existing domains — that signal feeds into prioritization. The roadmap is shaped by where the demand and the evaluation gap are both real.

03 Requesting a New Domain or Component

Users can request new domains or components through the STEM-Away forum.

How to submit a request: Post a request in the appropriate forum category on STEM-Away. Include what domain or component is needed and why — what roles it would support, what evaluation gap it would fill, and any context about demand.

What happens after a request: Requests are reviewed as part of the domain roadmap process. Domains and components are prioritized based on market demand and evaluation gap — the same criteria used for all domain selection. Requests with clear demand signals (multiple users requesting the same domain, employer interest, active hiring in the area) are prioritized higher.

What makes a strong request:

Name the specific domain or component, not just a broad field. Describe the roles it would serve — who is hiring for this, what do they struggle to evaluate. Explain what’s missing — why current evaluation methods don’t work for this area. If possible, reference specific scenario topics or case studies that could anchor evaluations in this space.

Not every request leads to a new domain or component. AIVIA’s domain coverage is intentionally curated rather than exhaustive — each domain requires rigorous scenario development, rubric design, and validation before it goes live. Quality matters more than coverage speed.