Authoring a Stack

A stack is a git repository that teaches an AI agent how to operate in a specific domain. When an agent reads your stack, it should become an expert operator — capable of deploying, managing, troubleshooting, and upgrading the target software.

Anatomy of a Stack

Every stack has these files at the root:

my-stack/
├── README.md       # Repo landing page
├── CLAUDE.md       # Agent entry point — persona, rules, routing
├── stack.yaml      # Machine-readable manifest
└── skills/         # Operational knowledge, organized by phase

Scaffold one with:

agentic-stacks create my-org/my-stack

Step 1: Design the Skill Hierarchy

Skills are directories of markdown files that teach the agent specific operations. Organize by what the operator is trying to do:

Phase Purpose Examples
FoundationUnderstanding and setupArchitecture, configuration, provisioning
DeployInitial deploymentBootstrap, networking, storage
PlatformPlatform layerGitOps, ingress, monitoring, security
OperationsDay-two managementHealth checks, scaling, upgrades, backup
DiagnoseTroubleshootingSymptom-based decision trees
ReferenceCross-cutting lookupsKnown issues, compatibility, decision guides

For complex stacks (10+ skills), use phase/domain nesting:

skills/
├── foundation/
│   ├── concepts/
│   └── infrastructure/
│       ├── README.md       # Overview + index
│       ├── aws.md          # Platform-specific
│       └── gcp.md
├── deploy/
│   ├── bootstrap/
│   ├── networking/
│   │   ├── README.md       # Decision matrix
│   │   ├── cilium.md       # Option deep dive
│   │   └── flannel.md
│   └── storage/
└── operations/
    ├── health-check/
    ├── upgrades/
    └── backup-restore/

Step 2: Write CLAUDE.md

CLAUDE.md is the agent's brain. It sets identity, enforces safety, and routes to skills.

# [Stack Name] — Agentic Stack

## Identity
[1-2 sentences establishing the agent's expertise]

## Critical Rules
[Numbered list of hard safety guardrails]

## Routing Table
| Operator Need | Skill | Entry Point |
|---|---|---|
| Deploy the cluster | bootstrap | skills/deploy/bootstrap |
| Troubleshoot issues | troubleshooting | skills/diagnose/troubleshooting |

## Workflows
### New Deployment
[Linear path through skills for first-time setup]

### Existing Deployment
[How to jump to the right skill for ongoing operations]

Writing Critical Rules

Critical rules prevent the agent from doing damage. Good rules are:

  • Specific: "Never run talosctl reset without operator approval" not "be careful"
  • Actionable: the agent can check compliance unambiguously
  • Justified: explain why — "etcd quorum loss means cluster down"
  • Minimal: 5-10 rules. Too many and the agent ignores them.

Step 3: Write stack.yaml

name: my-stack
owner: my-org
version: 0.1.0
description: >
  One paragraph describing what this stack teaches agents to operate.

repository: https://github.com/my-org/my-stack

target:
  software: target-software-name
  versions: ["1.x"]

skills:
  - name: skill-name
    entry: skills/path/to/skill
    description: One-line description

project:
  structure:
    - file-or-dir-in-operator-project

requires:
  tools:
    - name: tool-name
      description: What it's used for

depends_on: []

Tips: entry points to a directory, not a file. The directory's README.md is the entry point. description should help an agent decide whether to read the skill.

Step 4: Research and Verify

A stack is only as good as its accuracy. Before writing any skill:

  1. Fetch the target software's official documentation index (/llms.txt, /sitemap.xml, or GitHub source)
  2. Copy exact commands from the docs — do not reconstruct from memory
  3. Verify YAML field names, CLI flags, and config structure
  4. Note version-specific behavior
  5. Cross-reference with release notes and GitHub issues

Step 5: Write Skill Content

Optimize for how agents process information:

  • Imperative headings: "Install Cilium", "Verify Health" — not "About Cilium Installation"
  • Exact commands: full copy-pasteable commands with realistic example values
  • Decision trees: "If X fails -> check Y -> if Y is true -> do Z"
  • Tables for reference: comparison matrices, port requirements
  • Safety warnings: explicit callouts before any destructive operation
  • Full YAML/config examples: valid snippets, not fragments

Known Issues Pattern

Version-specific bugs get their own files in skills/reference/known-issues/:

### [Short Description]

**Symptom:** What the operator sees
**Cause:** Why it happens
**Workaround:** Exact steps to fix it
**Affected versions:** x.y.z through x.y.w
**Status:** Open / Fixed in x.y.w

Step 6: Decision Guides and Compatibility

For stacks where operators must choose between components, provide structured decision aids in skills/reference/decision-guides/:

  • Comparison tables with features, complexity, and performance
  • Recommendations by use case (production, development, cloud-native)
  • Migration paths — can you change this decision later?

And compatibility matrices in skills/reference/compatibility/ mapping which versions of components work together.

Step 7: Validate Your Stack

agentic-stacks doctor

Before publishing, check:

  • CLAUDE.md has identity, critical rules, routing table, and workflows
  • stack.yaml lists all skills with correct entry paths
  • Every skill directory has a README.md
  • All commands are exact and copy-pasteable
  • No placeholders (TBD, TODO, FIXME) remain
  • Known issues are documented for supported versions
  • The stack has been tested by having an agent use it end-to-end

Designing for Composition

Operators compose multiple stacks in a single project. To make your stack compose well:

  • Stay in your domain. A hardware stack shouldn't reimplement networking concepts that a platform stack covers.
  • Use depends_on to declare stacks that pair well with yours.
  • Avoid conflicting file outputs. Document what files your stack creates in project.structure.
  • Name skills distinctively. When an agent loads multiple stacks, skill names should make the domain clear.

Reference Implementations

Stack Complexity Pattern
openstack-kollaSimpleFlat phase-based (8 skills)
kubernetes-talosComprehensiveTwo-layer phase/domain (20 skills)