Agentic Harness Experiments

Michal Franc · 2026

github.com/michal-franc/ai-native-project-viewer

This is me for the past few months

Working with agents intensively...

There is definitely a visible change since Opus 4.6

🎮 Game (Godot)

Hobby project — roguelike / RPG, worked on evenings and weekends.

mfranc.com/game

🐸 Toadie

Personal assistant powered by Claude Code — available from phone, watch, voice.

mfranc.com/blog/toadie

I'm a control freak.

Even in a hobby project, I struggle to let go.

I did let go — once.

I stopped writing the code myself. No more crafting the perfect function — the agent does it now. A huge step for a software engineer.

But there's a limit to how far a control freak can stretch.

So I overcompensated — watching every session, approving every change, juggling 5–6 agents in my head at once.

It's unsustainable.

You can't keep up. You burn out → Brain Fry.

The only way to let go: channel the control-freak instinct into a harness.

The Leap: Hardcoded → Conversation

I traded predictability for speed.

Hardcoded Code (where I started)

Fully deterministic. Repeatable. But slow to change — every feature is hours of work.

↙

Conversation with Agent

Maximum flexibility. Insanely fast — I no longer write code at all. But every run is different.

I gained speed and flexibility — and lost stability.

We were never fully deterministic.

Code has always been shaped — not poured.

Hand-written code wasn't pure determinism either. We held it together with tests, verification, and analysis — the blacksmith's tools.

With agents, the craft is the same. The starting point moved a bit further left — but the same hammer still shapes it into something stable.

It's not a new problem — it's the old one, with a wider gap to close.

This was the flow for a while

The GitHub API was annoying.

Rate limits · auth tokens · network round-trips · brittle.

GitHub Issues API → local .md files

No API. No auth. No network.
If the agent can read a file, that's the API.

What it actually looks like.

Plain .md files with YAML frontmatter — that's the whole "database".

issues/ combat/ 42.md pathfinding.md toadie/ watch-support.md voice-commands.md

One file = one issue.
Git tracked. Agent readable. Offline by default.

Agents could just modify the .md files directly — no API, no tooling, just plain text.

But this created new problems.

Plain text + free agents = no guardrails.

No control

Agents could modify any .md file freely — change status, overwrite content, mark things done without actually doing them.

Still babysitting

I still had to watch every agent session. Without guardrails, mistakes were silent and hard to catch.

How to add control?

Well... skills. Sure thing.

Claude Code Skills

Reusable prompt templates that agents can invoke — with defined inputs, outputs, and guardrails built in. Instead of free-form file edits, agents call a skill. The skill enforces the rules.

But just telling agents how to nicely modify .md files is not enough.

Skills nudge us back toward repeatable.

Same conversation, written down once — now it's a reusable prompt.

Conversation with Agent

Maximum flexibility. Each run is different. Stable until it isn't.

Skill

Reusable prompt. Behavior starts to converge — still LLM output.

A half-step toward determinism — but not all the way.

But... skills are non-deterministic.

The bigger the context, the less predictable the agent gets.

Instruction overload

LLMs reliably follow ~150–200 instructions. Claude Code's system prompt already uses ~50. Every one you add competes for attention.

Lost in the Middle

LLMs have a U-shaped attention curve — they recall the beginning and end well, but miss the middle.

More instructions ≠ more control.

Is there a better way?

Stop fighting non-determinism — contain it instead.

non-deterministic LLMs + deterministic tools

Deterministic → Non-Deterministic → Back Again

The evolution that happens the moment you let an agent write your code.

Conversation with Agent

Maximum flexibility. Each run is different. Stable until it isn't.

Skill

Reusable prompt. Behavior starts to converge — still LLM output.

Skill + Code

Deterministic. Repeatable. The escape hatch from drift.

The more deterministic, the less flexible.

Agent reads and writes `.md` files.

Free-form access. Plain text in, plain text out — no API in the way.

Powerful — but nothing stops it from rewriting whatever it wants.

Inject determinism in the middle.

A tool sits between the agent and the files — every read and write goes through it.

Validation, side effects, prompts — all enforced in code, not hoped for in prose.

`issue-cli` in action

$ issue-cli done pathfinding ✗ unchecked items remain: [ ] add unit tests Complete checkboxes before marking done.

$ issue-cli check pathfinding "add unit tests" ✓ checked off $ issue-cli done pathfinding ✓ status → done · assignee cleared

$ issue-cli transition pathfinding --to testing ✓ status → testing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Next steps: 1. Run automated tests 2. Add ## Test Plan section (### Automated / ### Manual) 3. issue-cli comment pathfinding --text "tests: all passing" 4. issue-cli transition pathfinding --to documentation ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

DEMO TIME

Three commands to get the feel of issue-cli.

$ issue-cli list $ issue-cli process $ issue-cli help

DEMO TIME

The actual workflow.

$ issue-cli create $ issue-cli start <slug> $ issue-cli context <slug> $ issue-cli transition <slug>

DEMO TIME

Edit, transition, inspect — then look at the file directly.

$ issue-cli append <slug> $ issue-cli transition <slug> $ issue-cli context <slug>

→ open the .md file in nvim

→ same file rendered in the issue-viewer app

But why hardcode the rules?

Projects define their own workflow — validations, prompts, side effects — in workflow.yaml

statuses: - name: "in design" prompt: | Turn the problem into checklists, assumptions, and open questions. Ask for backlog approval before attempting the transition. transitions: - from: "in design" to: "backlog" actions: - type: "validate" rule: "has_checkboxes" - type: "append_section" title: "Acceptance Criteria"

Per-status prompts

Each status tells the agent what to do — injected automatically at dispatch time.

Transition validations

Must have checkboxes? Body not empty? Blocks the move if rules aren't met.

Side effects

Auto-append sections, clear assignee, inject checklists on transition.

Workflow Designer

Managing statuses, transitions, validations, and side effects in YAML by hand gets painful.

Example flow

💡

idea

👤 I have an idea

🤖

in design

agent clarifies,
writes checkboxes

👤

backlog

I approve design,
agent unclaimed

🤖

in progress

agent implements,
checks off items

🤖

testing

agent writes
test plan

👤

human-testing

I play-test
the feature

🤖

documentation

agent updates
the docs

✅

done

shipped

🤖 agent 👤 human approves / verifies

Building Blocks

Base Prompt

Injected to every agent session. Defines who the agent is, project context, and global rules.

Loaded from CLAUDE.md

Status Prompt

Injected on each status. Tells the agent exactly what to do at this stage of the workflow.

Defined in workflow.yaml

Inject Prompt

Optional extra context injected at dispatch — issue body, metadata, related issues.

Per-issue override

Validate

Blocks a transition if conditions aren't met — has_checkboxes, body_not_empty, assignee set.

Hard stops, not suggestions

Append Sections

Automatically adds structured sections to the issue on transition — checklists, test plans, docs stubs.

Side effect on transition

Human Approval

Certain statuses require a human to approve before the agent can proceed. Agent waits.

backlog, human-testing

DEMO TIME

One workflow — reviews — from blocks to designer.

→ show the yaml file

→ show the designer

DEMO TIME

A real issue transition in the project.

→ show the issue viewer app

→ walk through the timeline

Base Prompt — `issue-cli process`

== AI-Native Project Viewer == You are working with a markdown-based issue tracker. Issues are .md files in issues/<System>/ directories. == Rules == - Always use 'start' to begin work - Never skip statuses - Always update docs before marking done - Use checkboxes [x] to track subtasks == IMPORTANT: Command output == - NEVER suppress stderr (no 2>/dev/null) - ALWAYS read the full output — it contains next steps - If a command fails, fix it, don't retry

== When you pick up an issue == 1. issue-cli start <slug> claims it, moves to in-progress 2. Do the work, check off items 3. Add ## Test Plan section 4. issue-cli transition <slug> --to testing 5. issue-cli comment <slug> "tests: ..." 6. issue-cli transition <slug> --to docs 7. issue-cli done <slug> == Quick start == issue-cli next --version 0.1 issue-cli start <slug> issue-cli done <slug>

Agent runs issue-cli process first — sets rules, workflow, and step-by-step guidance for the whole session.

Three Prompts, Three Roles

Base Prompt — always injected

Who the agent is, how the project works, global rules, tooling conventions. Sets the foundation for every session regardless of what issue is being worked on.

issue-cli process

Status Prompt — injected per status

What to do right now at this stage. Changes with every status transition. Keeps the agent focused on the current phase without re-explaining the whole workflow.

defined in workflow.yaml · injected at dispatch

Inject Prompt — defined in workflow.yaml

Injects additional actions or instructions at a specific transition point. Used to tell the agent to do something extra — append a section, run a check, follow a specific pattern for this transition.

type: inject_prompt · fires on transition

Base Workflow vs Subsystem Overlay

Base Workflow

Defines statuses, transitions, validations, and prompts for the whole project. Every issue follows this by default.

statuses: idea → in design → backlog → in progress → testing → human-testing → done transitions: - from: backlog → in progress validate: has_assignee validate: human_approved

Subsystem Overlay

Per-system overrides that add extra instructions for specific domains — injected on top of the base workflow.

systems: Combat: transitions: - from: backlog → in progress inject_prompt: "Build a reproducible shooter-vs-target scenario and validate outcomes statistically." Campaign: transitions: - from: in design → backlog inject_prompt: "Identify ALL code paths that create or modify UnitData..."

Base handles the generic flow · subsystem overlays inject domain expertise at the right moment

Composed, Just-in-Time Prompt

Every agent dispatch assembles a fresh, precise context from its parts.

Base Prompt

who, what, rules

Status Prompt

what to do now

Subsystem Overlay

domain expertise

Inject Prompt

extra actions

Issue Content

the actual task

↓

Composed Context

Precise · minimal · assembled at dispatch time
Agent knows exactly what to do, in this project, at this status, for this issue.

No re-explaining. No guessing. No babysitting.

Validations & Human Approval

Not just rules — a signal to the agent that something went wrong.

Validations

Block a transition when the agent missed something — no checkboxes, empty body, test plan absent.

The CLI returns a clear error. The agent reads it, self-corrects, and retries. No human needed.

Human Approval

Hard stops at key gates — backlog, human-testing. Agent cannot proceed until a human explicitly approves in the UI.

Catches non-deterministic drift before it compounds.

Why this matters

LLMs are non-deterministic. They will occasionally skip steps, hallucinate progress, or mark things done prematurely.

Validations and approvals are checkpoints that surface undeterministic behavior before it causes damage — without requiring you to watch every session.

The system catches the agent. Not you.

The full loop: skill + code + harness.

Validations and approval gates close the loop — the system catches drift, not you.

Conversation with Agent

Maximum flexibility. Each run is different.

Skill

Reusable prompt. Behavior converges.

Skill + Code

Deterministic. Repeatable.

Skill + Code + Harness

Validations, approvals, retros. The system catches the agent — not you.

But wait... there is more

If agents already use issue-cli as their primary tool — why not extend it?

🐛 Bug Reports

Agent hits an unexpected failure during implementation? It files a bug issue itself.

issue-cli report-bug "description"

No context lost. Filed with full context while it's fresh.

📓 Retros

Agent reflects on what went wrong, what was unclear, what slowed it down — and writes it up.

issue-cli retrospective <slug> --body "..."

Structured feedback loop from the agent back to you.

The agent isn't just a worker — it becomes a participant in the workflow.

A Self-Reinforcing System

A bot reviews retros and bugs — and feeds findings back into the workflow.

🤖

Agent works

implements issues

→

🐛 📓

Files bugs & retros

via issue-cli

→

🤖

Bot reviews

reads retros & bugs

→

⚙️

Updates workflow

prompts, validations

→

🤖

Agent improves

better context next time

The system gets smarter over time — without you manually tuning it.

Real Retros from RogueTech

stress-volume-system-redesign idea · 2026-04-07

Tooling friction: issue-cli start is listed as step 1 in generic instructions, but it cannot work from idea status — the workflow should branch earlier for idea issues and direct the agent to gather clarification first.

weapon-mount-types-with-firing-arcs in progress · Equipment

Subsystem gap: Equipment guidance should explicitly direct agents to current arc/stress/chassis seams and existing structure family files earlier so design work starts from the real implementation surface.

Tooling bug: issue-cli check reported success for multiple items but issue-cli checklist remained stale immediately afterward — hard to know when status is truly ready for transition.

Agent surfaces its own blind spots → feeds directly into workflow and tooling improvements.

issue-cli — Agent First

Designed for AI consumption. If we notice agents want to do things a certain way — we just add it and get out of the way.

Aliases — because agents use natural names:

context <slug> alias: show → agents naturally say "show me the issue" start <slug> *** USE THIS TO BEGIN WORK *** claims + transitions in one command → agents kept doing claim + transition separately, so we merged it

check <slug> "text match" → agents would grep for checkbox text and edit .md files directly, so we gave them a proper command

AI-native design principles:

Every output contains next steps

Commands tell the agent exactly what to do next. No silence after success.

Errors are actionable

Failures explain why and what to fix — not just exit code 1.

Never get in the way

If an agent finds a workaround, that workaround becomes a command.

This tool will likely end up with tens of commands doing similar things in slightly different ways — and that's completely fine.

Launch agents from UI

Approve an issue, pick Claude or Codex — spins up a terminal, pastes the composed prompt, agent starts immediately.

DEMO TIME

The full loop — agent + workflow + human, end to end.

→ trigger a transition in the app on a real issue

→ walk through the CMT process end to end

Agentic Harness Experiments

This is me for the past few months

There is definitely a visible change since Opus 4.6

🎮 Game (Godot)

🐸 Toadie

I'm a control freak.

The Leap: Hardcoded → Conversation

We were never fully deterministic.

This was the flow for a while

The GitHub API was annoying.

What it actually looks like.

But this created new problems.

How to add control?

Skills nudge us back toward repeatable.

But... skills are non-deterministic.

Is there a better way?

Deterministic → Non-Deterministic → Back Again

Agent reads and writes .md files.

Inject determinism in the middle.

issue-cli in action

DEMO TIME

DEMO TIME

DEMO TIME

But why hardcode the rules?

Per-status prompts

Transition validations

Side effects

Workflow Designer

Example flow

Building Blocks

Base Prompt

Status Prompt

Inject Prompt

Validate

Append Sections

Human Approval

DEMO TIME

DEMO TIME

Base Prompt — issue-cli process

Three Prompts, Three Roles

Base Workflow vs Subsystem Overlay

Base Workflow

Subsystem Overlay

Composed, Just-in-Time Prompt

Validations & Human Approval

Validations

Human Approval

Why this matters

The full loop: skill + code + harness.

But wait... there is more

🐛 Bug Reports

📓 Retros

A Self-Reinforcing System

Real Retros from RogueTech

issue-cli — Agent First

Every output contains next steps

Errors are actionable

Never get in the way

Launch agents from UI

DEMO TIME

Agent reads and writes `.md` files.

`issue-cli` in action

Base Prompt — `issue-cli process`