Capabilities › Quant v1.1 › Evidence-Based Delivery

Evidence · Capability 02 · Evidence-Based Delivery

Not a Demo — An Evidence Closure

We don't prove a system works with one demonstration. For a phase to be declared "complete," all five dimensions — execution, governance, restart, discipline, evidence — must close.

The most common failure mode of AI projects isn't "couldn't build it" — it's "built it but it doesn't last." The demo looks impressive, then degrades quietly in real environments, then fails without traceable evidence. yunforce defines "complete" differently: complete means the system can run continuously, recover from anomalies, restart without distortion, and be audited without missing evidence.

3
markets covered
24
frozen configurations
7/7
fully-operational streak
0
silent drifts
This page proves

A phase being declared "complete" must be decided by evidence, not by demonstration.

Why this page maps to Capability 02

"Evidence-Based Delivery" as a core yunforce capability needs a real, multi-stage project with observable evidence trails to validate it. This page covers the first two phases of Quant v1.1 — a complete delivery journey from R&D to operations — and how we used evidence rather than intuition at every step to declare done.

What it actually takes to call a phase done

Done is not "it looks like it works." Done is "the evidence says it works — continuously, recoverably, reconcilably."

That sounds simple, but most AI projects don't deliver it. Reasons: demonstrating once is easy, running continuously for a week is hard; passing once is easy, recovering after restart is hard; producing one number is easy, attaching complete auditable evidence to that number is hard.

Quant v1.1 applied a consistent "done" standard across all three phases. What follows shows the two layers of that standard: boundary discipline (what can't be touched) and evidence closure (what must be satisfied).

Phases are not a continuous implementation stream

The easiest way to misread this project is to treat it as a continuous development stream. It is not. The first two phases have a hard product boundary: phase one produces a set of frozen core configurations, phase two consumes those to build a safe operating loop. Any change between phases is forbidden from silently revisiting phase one's outputs.

That separation matters — it is the only reason a later governance layer can stack safely.

Phase one produced

  • -System skeleton: CLI, config, logging, lifecycle
  • -Core capabilities: cross-market data access, backtests, real-environment validation
  • -The frozen core configurations (later phases must not silently retune them)

Phase two added

  • -Execution manifests, risk gating, circuit breakers
  • -Checkpoints, daily verification, evidence bundles
  • -Standardized criteria for declaring "closed"

Phase three (governance layer) inherited

  • -A runtime where execution authority remains exclusive
  • -An execution substrate that can be wrapped by governance but not bypassed
  • -An operating loop whose evidence and verification semantics are already proven

Engineering discipline often shows in what you don't do

Every later operational and governance claim depends on being able to honestly say "engineering changes were not hiding algorithm drift." Phase one gives us that line.

When a team says "we did governance" but its research outputs can be silently retuned during the engineering phase at any moment, that governance is theater. We made a real boundary — and a real boundary means actively giving up some "looks better" shortcuts.

01

Consumed by later phases

Critical configurations and parameter templates are treated as frozen production inputs. Later phases use them; they don't quietly reshape them.

02

Built, validated, but not promoted

Several strategy directions that looked promising in early stages were rejected under strict production-grade validation. We chose not to promote them, rather than loosening the validation bar to make them appear to pass.

03

Future — not a back door

More possibilities are kept as separate future work — not smuggled into the current phase's engineering hardening.

This restraint looks conservative. It's also the root that every later capability claim depends on.

Not "it seems to work" — closed on evidence

Phase two did not close because "the system looked like it was working." It closed when all five evidence families were satisfied — each independently verified, reproducible, and reviewable. This is yunforce's actual definition of "complete."

01

Execution integrity

The daily operating loop must close, with no missed executions or false anomalies.

02

Governance integrity

Default-deny, no silent state sync, correct recovery semantics — all evidence-backed.

03

Restart safety

Restart scenarios with non-empty state must recover safely, without duplicate operations or state corruption.

04

Resource & Risk Discipline

Critical resources, permissions, quotas, or risk caps must be automated and visible in daily verification — not waved past.

05

Evidence heartbeat

Summary, reconciliation, state snapshot, operation report, daily verification — all must exist and agree with one another.

Close condition: only when all five evidence families are satisfied and a seven-day fully-operational streak is banked can the phase be declared closed. Optional confidence tests remain optional.

From framework to proven operations

Phase 1.1October 2025

Core skeleton

The system stopped being just abstractions — it gained a runnable shell and configuration surface.

Phase 1.2 / 1.2.aOctober 2025

Multi-market expansion

Data ingestion across three markets, provider matrix, fee models, and real-data validation all landed.

Phase 1.3October-November 2025

Strategy validation and freeze

Strategy families were built, validated, filtered, and narrowed to the frozen core configurations.

Phase 2.aDecember 2025

Execution stack and validation

Execution manifest, risk, monitoring, audit all reached the validation exit gates.

Phase 2.b closeMarch 23, 2026

Final acceptance complete

Execution, governance, and evidence gates all closed. Restart quota satisfied. 7/7 fully-operational streak banked.

Where we run — and why

Quant v1.1 currently runs in a paper-trading validation environment, not on live capital.

That is an active engineering discipline choice, not a limitation.

Live capital deployment requires a different set of compliance and capital-protection boundaries. Until we have those boundaries, the paper-trading environment lets us validate the entire system under real market data, real interface constraints, and real time pressure — without taking on capital risk that isn't warranted yet.

Every number, every operation, every audit event is produced under real conditions. This is "complete validation in preparation for live," not "demoing something that could work."

Saying this restraint out loud is itself part of evidence-based delivery.