Capabilities › Quant v1.1 › Engineering Harness

Evidence · Capability 03 · Human-Led Multi-Agent Engineering

How We Use AI to Build AI Systems — Without Letting AI Lose Control

Many agents can suggest, draft, and implement — but final direction and high-risk authorization always stay human. That's not a slogan — it's discipline enforced in our own engineering practice.

By 2026, "using an LLM to write code" is no longer the hard part. The hard part is what happens when that code, configuration, or runtime change has to enter a real production environment — who is responsible, who approves, how is it traceable, when must everything pause for a human? This engineering harness exists to answer those questions.

4
harness layers
5+1
agent roles + human approver
3
forced pause points
1
final authority
This page proves

AI can accelerate engineering — but final direction, high-risk authorization, and production deployment always stay human.

Why this page maps to Capability 03

"Human-Led Multi-Agent Engineering" as a core yunforce capability needs to be validated under real pressure — not in one or two demo tasks, but in a project spanning multiple months, involving high-risk changes, and requiring sustained engineering discipline. Quant v1.1 is where we apply this method.

This page shows the complete mechanism: how it assigns roles, sets boundaries, forces pauses, and preserves evidence — letting multiple AI agents collaborate without losing control.

The three questions clients ask most about "AI building AI"

We answer directly.

Will it lose control?

No. Final direction, high-risk authorization, and production deployment all require human approval. Agents can suggest and draft, but cannot decide to proceed on their own. At three critical nodes (spec, plan, implementation), the system forces a pause and waits for human approval.

With multiple agents working together, won't they overwrite each other?

No. Every agent has explicit role boundaries: the Product agent can only review, not implement; the Dev agent can only change code, not deploy; the Ops agent can only execute within approved scope; the Research agent is read-only and never writes to formal systems. Cross-boundary actions require explicit delegation.

When something goes wrong, who is responsible? How do you trace it back?

The responsible party is always the human approver — because they approved the change. Traceability comes from evidence: every agent suggestion, plan, implementation, and deployment leaves a complete auditable trail through GitHub issues, commits, review docs, and close comments.

The real problem is not whether AI can produce

We use multiple agents, but the target is not maximum autonomy.

The target is a system where many contributors can help without eroding accountability, execution discipline, or auditability once the work becomes operationally real.

Many agents can contribute. Only a human can approve final direction.

The organization should not be less disciplined than the software

Quant v1.1 already separates suggestion, approval, execution, and evidence in the runtime. The team operating method intentionally follows the same shape — so the organization does not become less disciplined than the software.

Product runtime

  • -Advisor / research objects → can suggest, but do not execute
  • -Human approval → determines whether authority is granted
  • -Execution system → remains the only execution authority
  • -Evidence & verification → every meaningful action closed with evidence

Team operating system

  • -Agents suggest and build → contribute within defined role boundaries
  • -Human approves → direct authorization stays human
  • -Dev / Ops execute → implementation and changes happen only through the assigned execution role
  • -Evidence closeout → GitHub comments, review docs, test results close the loop

Key design choice: approval gates, execution boundaries, and evidence trails are not just product features. They are also team operating rules.

The operating model is split into four separate control surfaces

1

Role harness

Product frames and reviews, Dev implements, Ops executes runtime follow-through, Co-pilot routes, Research stays read-only, the human lead keeps final authority.

2

Authority harness

Direct authorization is required for high-impact mutation: production restarts, external interface writes, configuration cutovers, governance overrides, credentials, and production-status claims.

3

Execution harness

Work moves through a repeatable path: issue, spec confirmation, implementation plan, Product review, implementation, post-impl review, authorization gate, verification, evidence closeout.

4

Recovery harness

Recovery is part of the system design. The same discipline applies under stress: preserve evidence, state the failure, define the rollback path, keep authority boundaries visible.

Deep Case Detail · 深度案例细节 · Full Mechanism Breakdown

The core mechanism is shown above (four harness layers, three pause points, final authority, three client questions). Below enters the full role breakdown and execution path — deep content, not required reading.

Different agents exist for different tasks — not as interchangeable voices

Human final decision-maker

Final decisions on direction, delegation, authorization boundariesOwns production go / no-go decisions

Does not delegate away final approval authority

Product agent

Shapes issues, specs, acceptance logic, rollout boundariesReviews implementation against risk and scope

Should not become the default implementation owner

Dev agent

Drafts implementation plans and test plansModifies code, tests, technical docs

Must not weaken tests for green checks; does not deploy

Ops agent

Owns runtime continuity, server support, monitoring, evidence collectionExecutes within runbooks and authorization boundaries

Does not treat relayed approval as permission for mutation

Co-pilot agent

Coordinates and relays; routing layer, not content layerPreserves project method and keeps handoffs explicit

Does not become a hidden second approver

Research agent

Read-only research and analysisProduces findings for Product review

Does not implement code or touch formal system paths

Repo boundary: local agents edit and commit only in their own primary repos by default. Cross-repo edits require explicit human delegation.

The working path is intentionally heavier than "just ask an agent"

The workflow system and review process enforce a shared shape for work. This keeps planning, implementation, review, and runtime authorization separate — instead of collapsing them into a single chat.

01

Issue and scope

Work starts as an explicit issue with problem, action, acceptance, and test contract.

02

Product confirms spec

Before Dev plans anything, Product checks that the slice is real, bounded, and implementable.

03

Dev drafts plan and tests

Implementation plan and verification plan are drafted together, including authorization expectations and rollback conditions.

04

Product reviews the plan

Product approves, rejects, or loops with findings until the plan is acceptable.

05

Dev implements

Code, tests, and technical docs change inside approved scope, pushed with agent attribution.

06

Product post-impl review

Implementation is reviewed against the approved plan, real files, and the actual test / evidence surface.

07

Authorization and rollout

If runtime mutation is needed, Product and Dev align on verification, then the human lead authorizes the action.

08

Evidence closeout

Review docs, issue comments, logs, test output, and close comments preserve what changed and why it is done.

Three forced pause points

1.

Spec gaps — Slice isn't ready; Product must clarify or narrow before Dev plans

2.

Plan approved — Plan is acceptable, implementation can begin, but the approval is explicit

3.

Implementation approved — Code is acceptable; if deployment is needed, the next step is authorization, not silent execution

The workflow is defined by boundaries, not by one model vendor

LLM vendors change. Model APIs change. Accounts get rate-limited. A real engineering harness should not be locked to any specific provider. Here are three designs we made specifically for stress scenarios.

01

Role names sit above model names

Repo-facing labels are role names first, provider names second. That makes the operating layer more stable than the current model assignment.

02

Workflow logic is externalized

The relay protocol, pause points, and workflow system live in docs and scripts — not in one model's memory. That makes the path reproducible.

03

Model failure becomes a swap problem

If a role can be remapped while keeping the same scope, evidence, and authority contract, the system degrades more gracefully under vendor or account stress.

Two methodologies, two layers, independent

yunforce publishes a PROOF methodology — how we deliver AI projects for clients. The engineering harness on this page is not the same thing. It's the internal engineering collaboration mechanism used inside this specific in-house project.

-

PROOF methodology → External, cross-project, about "how we move an AI initiative forward with a client"

-

Engineering harness → Internal, single-project, about "how a multi-agent team moves high-risk engineering changes forward without losing control"

They solve different problems and don't replace each other. We show both because clients deserve to see — how a company that claims to "understand AI" actually keeps AI in check in its own engineering practice.