---
title: Stage 4 - Reliability & Testing
status: PLANNED
priority: HIGH
---

## Core Objectives
Validate the rework loop's stability and ensures objectivity in the evaluation process.

## Task Breakdown
- [ ] **Integration Tests**:
    - [ ] Test: A task that fails on attempt 1, reworks, and passes on attempt 2.
    - [ ] Test: A task that reaches `max_reworks` and stops even if score is still low.
- [ ] **Bias Validation**:
    - [ ] Audit logs to ensure Stage 2A truly receives zero context of previous rounds.

## Claude Code Inspiration: Recovery Circuit Breakers
*Reference: `src/query.ts`*
- Ensure the `max_reworks` logic is a hard circuit breaker (similar to `MAX_OUTPUT_TOKENS_RECOVERY_LIMIT`) to avoid infinite loops and runaway costs.
