Michal Franc ยท 2026
Working with agents intensively...
Hobby project โ roguelike / RPG, worked on evenings and weekends.
Personal assistant powered by Claude Code โ available from phone, watch, voice.
Even in a hobby project, I struggle to let go.
I did let go โ once.
I stopped writing the code myself. No more crafting the perfect function โ the agent does it now. A huge step for a software engineer.
But there's a limit to how far a control freak can stretch.
So I overcompensated โ watching every session, approving every change, juggling 5โ6 agents in my head at once.
It's unsustainable.
You can't keep up. You burn out โ Brain Fry.
The only way to let go: channel the control-freak instinct into a harness.
I traded predictability for speed.
Hardcoded Code (where I started)
Fully deterministic. Repeatable. But slow to change โ every feature is hours of work.
Conversation with Agent
Maximum flexibility. Insanely fast โ I no longer write code at all. But every run is different.
I gained speed and flexibility โ and lost stability.
Code has always been shaped โ not poured.
Hand-written code wasn't pure determinism either. We held it together with tests, verification, and analysis โ the blacksmith's tools.
With agents, the craft is the same. The starting point moved a bit further left โ but the same hammer still shapes it into something stable.
It's not a new problem โ it's the old one, with a wider gap to close.
Rate limits ยท auth tokens ยท network round-trips ยท brittle.
No API. No auth. No network.
If the agent can read a file, that's the API.
Plain .md files with YAML frontmatter โ that's the whole "database".
One file = one issue.
Git tracked. Agent readable. Offline by default.
Agents could just modify the .md files directly โ no API, no tooling, just plain text.
Plain text + free agents = no guardrails.
No control
Agents could modify any .md file freely โ change status, overwrite content, mark things done without actually doing them.
Still babysitting
I still had to watch every agent session. Without guardrails, mistakes were silent and hard to catch.
Well... skills. Sure thing.
Claude Code Skills
Reusable prompt templates that agents can invoke โ with defined inputs, outputs, and guardrails built in. Instead of free-form file edits, agents call a skill. The skill enforces the rules.
But just telling agents how to nicely modify .md files is not enough.
Same conversation, written down once โ now it's a reusable prompt.
Conversation with Agent
Maximum flexibility. Each run is different. Stable until it isn't.
Skill
Reusable prompt. Behavior starts to converge โ still LLM output.
A half-step toward determinism โ but not all the way.
The bigger the context, the less predictable the agent gets.
Instruction overload
LLMs reliably follow ~150โ200 instructions. Claude Code's system prompt already uses ~50. Every one you add competes for attention.
Lost in the Middle
LLMs have a U-shaped attention curve โ they recall the beginning and end well, but miss the middle.
More instructions โ more control.
Stop fighting non-determinism โ contain it instead.
The evolution that happens the moment you let an agent write your code.
Conversation with Agent
Maximum flexibility. Each run is different. Stable until it isn't.
Skill
Reusable prompt. Behavior starts to converge โ still LLM output.
Skill + Code
Deterministic. Repeatable. The escape hatch from drift.
The more deterministic, the less flexible.
.md files.
Free-form access. Plain text in, plain text out โ no API in the way.
Powerful โ but nothing stops it from rewriting whatever it wants.
A tool sits between the agent and the files โ every read and write goes through it.
Validation, side effects, prompts โ all enforced in code, not hoped for in prose.
issue-cli in action
Three commands to get the feel of issue-cli.
The actual workflow.
Edit, transition, inspect โ then look at the file directly.
โ open the .md file in nvim
โ same file rendered in the issue-viewer app
Projects define their own workflow โ validations, prompts, side effects โ in workflow.yaml
Each status tells the agent what to do โ injected automatically at dispatch time.
Must have checkboxes? Body not empty? Blocks the move if rules aren't met.
Auto-append sections, clear assignee, inject checklists on transition.
Managing statuses, transitions, validations, and side effects in YAML by hand gets painful.
Injected to every agent session. Defines who the agent is, project context, and global rules.
Loaded from CLAUDE.md
Injected on each status. Tells the agent exactly what to do at this stage of the workflow.
Defined in workflow.yaml
Optional extra context injected at dispatch โ issue body, metadata, related issues.
Per-issue override
Blocks a transition if conditions aren't met โ has_checkboxes, body_not_empty, assignee set.
Hard stops, not suggestions
Automatically adds structured sections to the issue on transition โ checklists, test plans, docs stubs.
Side effect on transition
Certain statuses require a human to approve before the agent can proceed. Agent waits.
backlog, human-testing
One workflow โ reviews โ from blocks to designer.
โ show the yaml file
โ show the designer
A real issue transition in the project.
โ show the issue viewer app
โ walk through the timeline
issue-cli processAgent runs issue-cli process first โ sets rules, workflow, and step-by-step guidance for the whole session.
Defines statuses, transitions, validations, and prompts for the whole project. Every issue follows this by default.
Per-system overrides that add extra instructions for specific domains โ injected on top of the base workflow.
Base handles the generic flow ยท subsystem overlays inject domain expertise at the right moment
Every agent dispatch assembles a fresh, precise context from its parts.
No re-explaining. No guessing. No babysitting.
Not just rules โ a signal to the agent that something went wrong.
Block a transition when the agent missed something โ no checkboxes, empty body, test plan absent.
The CLI returns a clear error. The agent reads it, self-corrects, and retries. No human needed.
Hard stops at key gates โ backlog, human-testing. Agent cannot proceed until a human explicitly approves in the UI.
Catches non-deterministic drift before it compounds.
LLMs are non-deterministic. They will occasionally skip steps, hallucinate progress, or mark things done prematurely.
Validations and approvals are checkpoints that surface undeterministic behavior before it causes damage โ without requiring you to watch every session.
The system catches the agent. Not you.
Validations and approval gates close the loop โ the system catches drift, not you.
Conversation with Agent
Maximum flexibility. Each run is different.
Skill
Reusable prompt. Behavior converges.
Skill + Code
Deterministic. Repeatable.
Skill + Code + Harness
Validations, approvals, retros. The system catches the agent โ not you.
If agents already use issue-cli as their primary tool โ why not extend it?
Agent hits an unexpected failure during implementation? It files a bug issue itself.
issue-cli report-bug "description"
No context lost. Filed with full context while it's fresh.
Agent reflects on what went wrong, what was unclear, what slowed it down โ and writes it up.
issue-cli retrospective <slug> --body "..."
Structured feedback loop from the agent back to you.
The agent isn't just a worker โ it becomes a participant in the workflow.
A bot reviews retros and bugs โ and feeds findings back into the workflow.
The system gets smarter over time โ without you manually tuning it.
Tooling friction: issue-cli start is listed as step 1 in generic instructions, but it cannot work from idea status โ the workflow should branch earlier for idea issues and direct the agent to gather clarification first.
Subsystem gap: Equipment guidance should explicitly direct agents to current arc/stress/chassis seams and existing structure family files earlier so design work starts from the real implementation surface.
Tooling bug: issue-cli check reported success for multiple items but issue-cli checklist remained stale immediately afterward โ hard to know when status is truly ready for transition.
Agent surfaces its own blind spots โ feeds directly into workflow and tooling improvements.
Designed for AI consumption. If we notice agents want to do things a certain way โ we just add it and get out of the way.
Aliases โ because agents use natural names:
AI-native design principles:
Commands tell the agent exactly what to do next. No silence after success.
Failures explain why and what to fix โ not just exit code 1.
If an agent finds a workaround, that workaround becomes a command.
This tool will likely end up with tens of commands doing similar things in slightly different ways โ and that's completely fine.
Approve an issue, pick Claude or Codex โ spins up a terminal, pastes the composed prompt, agent starts immediately.
The full loop โ agent + workflow + human, end to end.
โ trigger a transition in the app on a real issue
โ walk through the CMT process end to end