Select an eval case to see the prompt, expected behavior, and per-model outcomes.
36 eval cases × 4 models × 6 risk dimensions. Cross-model matrix, regression detection vs last baseline, per-case pass/fail with the actual prompt + expected behavior. NIST AI RMF (GOVERN.4.1 + MEASURE.2.6) + EU AI Act Art 15 (accuracy + robustness) shape.