Skip to main content
Skip to article

Research Note

Claude Code Skill Authoring Methodology

Zhenyu He · Jobs Stroustrup 4 min read

Definition

Claude Code Skill is the new paradigm Anthropic released in late 2025: package complex multi-step agent workflows as a standardized, auto-discoverable “skill” loadable by Claude Code CLI. Minimum unit is a directory containing:

  • One SKILL.md (frontmatter name / description / allowed-tools; body is the markdown workflow)
  • Any number of auxiliary data/rule files (persona, examples, blocklists, etc.)
  • Optional scripts or tools (shell, AppleScript, etc.)

On top of that baseline, Zhenyu has developed his own preferred Skill architecture paradigm — the core of which is not “call the API” but “engineer a recurring task into an agent that Claude can safely, iteratively, and self-improvingly execute.”

Core Arguments (Zhenyu’s Skill Architecture Paradigm)

1. Phased, explicit workflow

  • Split the agent’s work into clear Phases in SKILL.md (e.g., “open target → loop over objects → produce summary”)
  • Each phase lists concrete steps (screenshot → identify → decide → execute)
  • Hard-code known UI/interaction positions so the agent doesn’t need to “explore”

2. Separating data from logic

  • SKILL.md = workflow (How)
  • (persona file) / profile files = user/scenario context (Who)
  • (examples corpus) = style corpus / few-shot (What tone)
  • (blocklist file) / (safe-action file) / rule files = hard rules (Must / Must-not)
  • (memory file) / structured journal = long-term memory (State)
  • Benefit: each dimension can evolve independently; context can be loaded selectively rather than all-at-once

3. Human-in-the-loop + data flywheel

  • Before any high-risk action (send message, place order, truly modify external state), pause and present draft to user
  • User’s “ok” or edited final version is auto-written back to examples / memory files — this sample immediately informs the next generation
  • Flywheel effect: the more the skill is used, the closer the output gets to the user’s personal style; no training needed, personalization via accumulated in-context examples

4. Tiered behavior rules

  • Hard rules (blocklist): skip entirely
  • Semi-hard rules (like-only): degrade to a safe action
  • Frequency/interval rules (e.g., “Zhang Chen motormouth rule”): time-dimension dedup
  • Persona-level soft rules (e.g., “forbid ‘envy’ toward X”): fine-grained linguistic preference
  • Auto-update triggers for each layer must be designed carefully: only “user explicitly instructs” OR “Claude recommends + user confirms” should write to the rule files, to avoid a one-off edit being learned as a long-term preference

5. Structured long-term memory vs runtime context

  • Files like (memory file) are not logs but entity-organized journals: one section per person with header + background + current location + dated updates appended inside their section
  • Benefit: next time a post from the same friend appears, all their existing context is one fetch away
  • Contrast: a pure timeline log is O(n) to search; entity-sectioned journals are O(1)

6. Skill = the minimum unit for “productizing” LLM capability

  • Differs from “write a Python script”: skills embrace the fact that agents aren’t deterministic, delegating uncertainty to the LLM’s judgment + human backstop rather than hard-coding every branch
  • Differs from “just chat with ChatGPT”: skills persist reusable context and rules, so you don’t re-explain every session

Anti-patterns

Anti-pattern A: Skill as “prompt template” — Single SKILL.md stuffed with all rules/samples/persona. Maintenance nightmare, context explosion. Zhenyu’s paradigm: split into multiple markdown files by dimension.

Anti-pattern B: Full automation, no human-in-loop — Looks efficient, but external actions once wrong can’t be rolled back (message already sent, order placed). Zhenyu’s paradigm: pause before high-risk actions; low-risk actions (like, screenshot) can run freely.

Anti-pattern C: Memory files are write-only — Some implementations log but never read back — effectively no memory. Zhenyu’s paradigm: SKILL.md explicitly lists “Before Starting” files to read.

Practiced Skill Projects

SkillPurposeCore data files
private personal automation project (WeChat Moments interaction)WeChat Moments auto-like + comment(persona file) / (examples corpus) / (blocklist file) / (safe-action file) / (memory file)
(future)

Open Questions

  • Shared infrastructure across skills: if multiple skills exist, should (persona file) (user identity) be shared or duplicated? Plan to build a universal ~/.claude/personal-context/ referenced by all skills?
  • Skill versioning and rollback: when a skill auto-updates blocklist/examples, how to roll back a bad update? Currently git-based — need finer-grained audit logs?
  • Cross-skill memory migration: friend updates recorded in skill A (Moments) — should skill B (calendar) read them? Feels like a unified “personal knowledge graph” layer, not duplication
  • Claude Skill vs Anthropic Agent SDK: when to use Skill vs code-level Agent SDK? Zhenyu currently prefers Skill (low-code, user-maintainable), but complex state management may favor SDK

Sources

  • private personal automation project (WeChat Moments interaction) — first full instance