Newer
Older
cortex-hub / docs / features / harness_engineering / harness_tasks / reliability.md

title: Stage 4 - Reliability & Testing status: PLANNED

priority: HIGH

Core Objectives

Validate the rework loop's stability and ensures objectivity in the evaluation process.

Task Breakdown

  • Integration Tests:
    • Test: A task that fails on attempt 1, reworks, and passes on attempt 2.
    • Test: A task that reaches max_reworks and stops even if score is still low.
  • Bias Validation:
    • Audit logs to ensure Stage 2A truly receives zero context of previous rounds.

Claude Code Inspiration: Recovery Circuit Breakers

Reference: src/query.ts

  • Ensure the max_reworks logic is a hard circuit breaker (similar to MAX_OUTPUT_TOKENS_RECOVERY_LIMIT) to avoid infinite loops and runaway costs.