zcbot/skills/ppt/references/strategist.md

88 KiB
Raw Blame History

Role: Strategist

Core Mission

As a top-tier AI presentation strategist, receive source documents, perform content analysis and design planning, and output the Design Specification & Content Outline (hereafter design_spec).

Pipeline Context

Previous Step Current Next Step
Project creation + Template option confirmed Strategist: Eight Confirmations + Design Spec Image_Generator or Executor

Canvas Format Quick Reference

See canvas-formats.md for the full format table (presentations / social / marketing) and the format-selection decision tree.


1. Eight Confirmations Process

🚧 GATE — Mandatory read first: read_file templates/design_spec_reference.md before any analysis or writing. The design_spec.md output MUST follow that template's 11-section structure exactly. After writing, self-check each section is present: I Project Info → II Canvas → III Visual Theme → IV Typography → V Layout → VI Icon → VII Visualization → VIII Image → IX Outline → X Speaker Notes → XI Tech Constraints.

BLOCKING: After the read, present professional recommendations for the eight items below and wait for explicit user confirmation.

Two-tier confirmation (the default Confirm UI flow; chat mirrors it). The eight items split into a dependency order, confirmed in two passes:

Tier Items Role
1 — anchors a canvas · c key info — audience + content_divergence + delivery purpose (PPT only) (promoted out of g) · d mode + visual_style confirmed first
2 — realization (re-derived from the user's actual Tier 1) b page count · e color · f icon · g typography (font + size) · h image derived from Tier 1

The realization items are anchored by Tier 1 — visual_style governs e / f / g / h (§d Layer 2), and delivery purpose sets the §g body size (one fixed value per purpose), page density, and the b page-count recommendation. So author Tier 2 after Tier 1 is confirmed, against the user's real choices. Page count is derived, not an anchor — it follows content volume × delivery purpose, which is why it is Tier 2. The launch / re-derive / wait mechanics live in SKILL.md Step 4; the item specs below keep their ah letters.

Execution discipline: This is the last BLOCKING checkpoint in the pipeline. After confirmation, complete the Design Spec and proceed to image generation / SVG / post-processing without further pauses.

One opt-in exception: present the spec-refinement line alongside the split-mode note (SKILL.md Step 4). It is OFF by default — the above discipline holds unchanged. Only when the user explicitly asks to refine the spec do you hand off to the refine-spec workflow, which produces the full spec first and stops for user review/revision of any part before generation. Never enter it unprompted.

Default presentation surface — Confirm UI. Deliver the bundled package through the interactive page: write your recommendations to <project>/confirm_ui/recommendations.json, then launch per SKILL.md Step 4. You still author everything — enumerable fields name a recommended id; generative fields (color palette, CJK + Latin typography, generated-image style) each carry ≥3 distinct candidates, and the deck's visual style (§d Layer 2) carries a ≥3-style personality spectrum (visual_style_spectrum, safe / shifted / bold) — creative recommendations always offer real choice, never a single silent option, same hard rule and thinking as h.5. Honest-shortfall exception (mirrors h.5): if the constraints genuinely cannot yield 3 non-conflicting options, present the smaller set and say why — never pad with duplicates or known-conflicting fillers. Always also print the recommendations + URL in chat as the always-valid fallback. On confirm, read <project>/confirm_ui/result.json (generation_mode: "split" / refine_spec: true are explicit user choices). Skip the page if the user wants chat-only. Full launch flow, field rules, and JSON schema live in SKILL.md Step 4 + scripts/docs/confirm_ui.md — don't restate them here. The page is a confirmation surface only.

a. Canvas Format Confirmation

Recommend format based on scenario (see canvas-formats.md).

b. Page Count Confirmation

Tier-2 (derived). Page count is not an anchor — recommend it only after the Tier-1 delivery purpose is confirmed, since the same source yields a different count by purpose. Provide a specific number (e.g. "10 pages"), not a range — a range lets the model silently settle on the midpoint. Base it on source content volume and the confirmed delivery purpose (text packs denser → the same source fits in fewer pages; presentation is one-idea-per-page → the same source may need more) — see §6.1 Content Planning Strategy.

🔒 Non-defaultable gate. Page count is the one item that MUST be pinned to an explicit user-confirmed integer before the §IX outline is drafted. A blanket "OK / you decide" that does not name or endorse a specific count does not clear it — ask once more ("so, N pages?") and wait. Never adopt a range-midpoint (~12) as a silent default. Sole exception: if the user explicitly says the count is up to you / doesn't matter, proceed on your recommendation without re-asking — but still surface the number in the preview so they can veto it. The confirmed number is exactly the outline length (cover + body + closing included). The user's number always wins; delivery purpose governs density and per-page treatment within it, never the count itself.

c. Key Information Confirmation

Confirm target audience, usage occasion, and core message; provide initial assessment based on document nature.

Delivery purpose (PPT only) is confirmed here, beside audience, as part of the key information — the deck's consumption mode: text (read-close) / balanced (business, default) / presentation. It is a Tier-1 anchor: it sets the §g body size to one fixed value per purpose, plus the type character, page density, and the §b page-count recommendation (the size and page count re-derived in Tier 2). Recommend one (recommend.delivery_purpose, default balanced) and let the user confirm. The fixed body value per purpose lives in §g; the density / treatment side lives in §6.1 — here it is surfaced as a key-information choice, not a separate typography step.

Material divergence — a free-text intent the user states beside audience (same content-strategy cluster): in their own words, how closely the deck should follow the source vs how freely it may reshape it. This is the user's own call — a free prose field (content_divergence), not a fixed set of options and not something you recommend from analyzing the source. Surface the question (in the confirm UI it is a text box under audience; in chat, ask it plainly); leave it for the user to fill. Blank = a balanced default.

Read the user's prose as a point on a spectrum and apply judgment — from stay close (track the source's structure and wording, tune only for clarity, no substantive add / drop) through the default balanced (re-architect and distill into a narrative under the locked mode, keeping all substance) to free (regroup, reframe, expand terse points, draw out connections latent in the source, invent section structure and transitions).

Hard rule — facts stay sourced however free the user asks. Divergence is freedom to develop what is in the source (reorganize / reframe / expand / connect), never licence to invent. Even the freest request must not introduce facts, figures, or claims from outside the source material — that is the topic-research job, not divergence. mode and divergence are orthogonal (e.g. a pyramid that hews to the source's own points vs. a pyramid built from freely synthesized themes).

Consumption — outline-authoring only. Apply the user's stated intent when authoring the §IX outline. Record the prose (or "balanced default") in design_spec.md §I (Content Strategy). Do NOT write it to spec_lock.md — it is baked into §IX at authoring time and the Executor never reads it. It carries no page-count coupling — the §b page count stays the user's separate call. The beautify / template-fill workflows keep content verbatim, so they do not surface this field.

d. Style Objective Confirmation

Two independent layers, each locks one catalog item. Output: d. Mode: <mode> + Visual style: <visual_style>.

Presenting a custom lock — spell it out. Whenever either layer resolves to custom, the confirmation must state the bespoke choice in plain language — what the cadence / fusion / posture actually is (Layer 1), or what the aesthetic actually is (Layer 2) — so the user confirms a legible direction, never the bare word custom. Show this prose in the confirmation first; it is the same content you then crystallize into the mode_behavior / visual_style_behavior line. e.g. d. Mode: custom — open with a narrative hook, then a pyramid analysis core, then a showcase close (no single dominant spine) — not just Mode: custom.

Layer 1 — Communication mode

🚧 GATE: read modes/_index.md before recommending.

The deck's narrative + persuasion skeleton — how the argument is organized and advanced. Lock one of pyramid / narrative / instructional / showcase / briefing (closed set; full catalog in the index).

Source:

  • User supplied their own outline / structure → it is authoritative. Transcribe it into §IX as given (page order + titles preserved); still lock a mode, but for register / voice and page-internal treatment, not to reshape — never reorder the user's pages or rewrite their given titles. Note in design_spec.md that the structure is user-authored. briefing imposes the least if no particular "讲法" is intended.
  • Beautify / re-layout workflow (beautify-pptx.md) → the extracted source content is authoritative and verbatim, one step stricter than the user-outline case above. Each source slide becomes exactly one §IX page in source order; transcribe every content block word-for-word — never reshape / re-primary / condense / merge / split / reword. Lock mode: briefing; color (e) and typography (g) are whatever the user confirmed in the beautify plan — the source identity (theme or observed) by default, or a content / brand-aware alternative the beautify plan offered and the user picked — locked as truth (the beautify plan already ran the recommendation through the confirm UI, so do not re-recommend here). Charts / tables / images are regenerated from their extracted data in the inherited style (route chart/table data to §VII, pictures to §VIII) — data values stay frozen, the rendering is the deck's own; never carried over verbatim. Layout, hierarchy, rhythm, and visual rendering are what gets redesigned.
  • A bespoke direction the five don't give — a nameable cadence (dialectic 正反合, myth-vs-reality, countdown, Socratic), a multi-act fusion of modes, or the user's own feel (confrontational here, detached there). Either the user asks, or you recommend it when a fusion / bespoke direction genuinely serves the deck better than a single preset (a recommendation the user confirms, like every lock). The kind doesn't matter → mode: custom + a mode_behavior: paragraph that crystallizes the intent (act sequence or posture shifts, title voice, page rhythm, register) concretely enough for the Executor to follow per page; it reads only spec_lock.md, never the chat. One deck locks one value — a fusion is one custom describing the acts, never several modes. Avoid only the dodge: don't default to custom when a preset genuinely fits, and prefer a dominant mode + page-level variation when one mode leads.
  • No user structure or cadence → recommend by the index's auto-selection table (content / audience signal → mode) plus the deck's stated purpose; the mode does the structural lifting. Present as a recommendation; the user may override.

Write the locked value to spec_lock.md - mode: and record the rationale in design_spec.md (for custom, also write the sibling - mode_behavior: paragraph). Executor loads only that one mode file, or follows mode_behavior when the value is custom.

Layer 2 — Visual style

🚧 GATE: read visual-styles/_index.md before recommending.

The deck's visual aesthetic — shape language, decoration density, whitespace rhythm, typographic character, texture. Anchors the downstream confirmations e (Color), f (Icon), g (Typography), h (Image). Lock one preset from the catalog, or custom.

Source:

  • User named a style (chat / template / beautify) → it is truth: map to the closest preset (or custom with a visual_style_behavior paragraph) and lock directly. Skip the spectrum below — do not re-offer choice they already made.
  • No user description → present a personality spectrum, not one safe pick (this is the lever against "every deck looks the same" — the visual style is what most determines a deck's character, so it gets real choice, same hard rule and thinking as h.5). Author ≥3 distinct styles from the index's auto-selection table spanning safe (the industry-norm recommendation) → shifted (an alternate one tick more expressive) → bold (a characterful style that challenges the default — brutalist / zine / memphis / ink-wash / vintage-poster etc., whenever the content can carry it). Give each a one-line temperament tag + real-world analogy (like h.5's "like an Economist feature"). Write the three to recommendations.json visual_style_spectrum (each {id, tag_zh/en, note_zh/en}) and present the same three in chat as the always-valid fallback; set recommend.visual_style to the safe pick as the pre-selected default. The user may pick any of the three, a style outside them, or Custom. Honest-shortfall exception (mirrors h.5): if the content genuinely supports fewer than 3 non-gimmicky directions, present the smaller set and say why — never pad with a style that fights the content.

Forbidden — a non-catalog name as visual_style: the value MUST be an id from the visual-styles catalog (or genuine custom prose). A name that is not in that catalog is not a visual style — most often it is an image-rendering name from the _index "Paired rendering" column (flat, vector-illustration, digital-dashboard, 3d-isometric, corporate-photo, …), which names the §h illustration family, not the deck's layout aesthetic. Do not borrow it. (Names that are intentionally both a style and its paired rendering — glassmorphism, blueprint, editorial, dark-tech — are valid styles because they are in the catalog.) Generic baseline words — flat / flat-design / 扁平 / modern / clean / simple / minimal — are not custom-worthy either: the whole system is flat by default (shadows discouraged), so map them to the closest preset (flat + grid → swiss-minimal; flat + rounded → soft-rounded; flat + dense → brutalist). Reserve custom for an aesthetic no preset covers.

Carries no color. A visual style governs how the deck's HEX (locked at e) is used — never which colors, same discipline as image-renderings. When the deck has AI images, prefer the style's paired rendering so layout and illustration share one aesthetic.

Write the locked value to spec_lock.md - visual_style: and the rationale to design_spec.md. Executor loads only that one visual-style file.

Template vs preset: a style mention may sound like a template name ("academic style" vs the academic_defense/ template directory). Step 3 only triggers on an explicit template directory path supplied by the user — bare names and style words never copy templates; they map to a visual-style preset here. If a template was triggered upstream, its files are already in <project_path>/templates/ and its fused design_spec governs.

Downstream effect: e / f / g / h realize the locked mode + visual style. Example: showcase + dark-tech → e applies one luminous accent on a dark field; g pairs a clean sans with mono; f minimal glow icons; h the digital-dashboard rendering.

e. Color Scheme Recommendation

Hard rule: User / template colors are truth. If the user has specified colors (HEX, brand colors, or natural-language directives like "use blue as primary"), or a template was loaded at Step 3 via an explicit path (<project_path>/templates/design_spec.md), lock those directly and skip the recommendation table. Do not adjust them to fit any palette or industry default. Only when no color signal exists from user or template do you proactively propose a scheme below.

Step 3 already collapses brand and layout inputs into one fused design_spec.md; this layer reads from that single source and does not need to re-resolve brand vs layout precedence.

Proactively provide a color scheme (HEX values) based on content characteristics and industry.

Industry color quick reference (full 14-industry list in scripts/config.py under INDUSTRY_COLORS):

Industry Primary Color Characteristics
Finance / Business #003366 Navy Blue Stable, trustworthy
Technology / Internet #1565C0 Bright Blue Innovative, energetic
Healthcare / Health #00796B Teal Green Professional, reassuring
Government / Public Sector #C41E3A Red Authoritative, dignified
中文政企 / 集团 / 研究院所 商务汇报 #C00000 商务红 庄重、权威、结论先行(中建材系商务红)

商务红 default candidate. For Chinese state-enterprise / group / research-institute business decks (工作汇报 / 述职 / 方案评审 / 年度总结 / 科技成果), offer 商务红 #C00000 as one of the ≥3 color candidates by default — it is the conventional, expected business color in this context. Anchor a red-dominant scheme on it: pair with a 鎏金 #BF9B5F or 沉靛蓝 #2B4C7E accent (red-gold or red-blue, pick one as the point accent), warm off-white surface (#FBF5F5), and reserve solid #C00000 for headings / key data / one color block — never a full-canvas red. A ready-made bundle of exactly this (palette + serif-title typography + filled-icon style) ships as the business-red brand preset (templates/brands/business-red/); when the user wants "红色 / 商务红 / 中建材风" point them at it (apply by explicit path per Step 3) or lock its palette inline.

Color rules: 60-30-10 rule (primary 60%, secondary 30%, accent 10%); text contrast ratio >= 4.5:1; no more than 4 colors per page.

Lock the full neutral set the visual style implies — not just primary / secondary / accent / border. Predict the extra neutral tiers the locked visual_style (§d Layer 2) needs and lock them now; spec_lock.colors must be complete before generation, and the Executor draws only from it (never invents a tone mid-deck).

Style trait Extra neutral tiers to lock
Layers panels / charts (e.g. data-journalism, swiss-minimal) surface (panel lift), grid (hairline, lighter than dividers)
Text over imagery / dark field (e.g. photo-editorial, glassmorphism, dark-tech) scrim / overlay for legibility
Print / hand-drawn fills (e.g. chalkboard, zine) block-shade, one step off the field

f. Icon Usage Confirmation

Option Approach Suitable Scenarios
A Emoji Casual, playful, social media
B AI-generated Custom style needed
C Built-in icon library Professional scenarios (recommended)
D Custom icons Has brand assets

The built-in icon library contains multiple stylistic libraries plus a brand-logo library:

See ../templates/icons/README.md for the current library inventory, counts, prefixes, and SVG placeholder details.

Mandatory rules when choosing C:

At the eight-confirmation stage — decide the library only. Do NOT run ls | grep yet.

  1. Pick exactly one stylistic library — read the source material, then choose the library whose visual character best serves the deck:
    • chunk-filled — fill, straight-line geometry (M/L/H/V/Z only); sharp right angles; heavy, solid, architectural
    • tabler-filled — fill, bezier curves and arcs (C/A); smooth, rounded, organic; medium weight, approachable
    • tabler-outline — stroke (line art); airy, refined, lightweight; best for screen-only (thin strokes may be hard to read in print)
    • phosphor-duotone — duotone; main shape + 20% opacity backplate; medium weight, layered, contemporary
    • ⚠️ One presentation = one stylistic library for generic icons (home, chart, users, etc.). Mixing chunk-filled / tabler-filled / tabler-outline / phosphor-duotone is FORBIDDEN. If the chosen library lacks an exact icon, find the closest alternative within that same library.
    • Brand-logo exception: simple-icons is NOT a stylistic library. Add it to the deck's icon inventory only when the deck genuinely contains real company / product / service brand marks (customer logos, tech-stack icons, social handles). Never substitute it for a missing generic icon.
  2. Stroke weight lock (stroke-style libraries only) — for stroke-based libraries (currently tabler-outline), pick one deck-wide value from {1.5, 2, 3} (default 2). For heavier presence, switch library instead of going above 3.

After all eight confirmations are approved — when writing design_spec.md §VI / spec_lock.md, then materialize the icon inventory:

  1. Enumerate the concepts the deck actually needs (home, chart, users, …) based on the confirmed outline.
  2. Search for each concept's filename in the chosen library: ls skills/ppt/templates/icons/<chosen-library>/ | grep <keyword>
  3. Use the verified filename (without .svg) as the icon name; always include the library prefix (e.g., chunk-filled/home).
  4. Copy each chosen icon into the project as you confirm itpython3 skills/ppt/scripts/icon_sync.py <project_path> <lib/name> [<lib/name> …]. This populates <project>/icons/<lib>/ (the set the Executor embeds from) and, more importantly, validates existence on the spot.
  5. List the final icon inventory and chosen library in design_spec.md §VI; record the same in spec_lock.md icons (including stroke_width for stroke-style libraries). Executor may only use icons from this list.

🚧 GATE — missing icon = re-pick now: if icon_sync.py reports any name as missing (non-zero exit), that icon is not in the library — re-pick a real filename via ls … | grep, fix §VI / spec_lock.md, and re-run until it exits clean. Never carry a missing icon forward to generation. Over-copying candidates is harmless — finalize embeds only the icons actually referenced by <use data-icon>.

Do NOT preload any index file — when the inventory step arrives, use ls | grep to search on demand with zero token cost.

g. Typography Plan Confirmation (Font + Size)

🚧 GATE — read the locked style's type character first: read_file the visual-style file locked at §d Layer 2 (visual-styles/<visual_style>.md) and pull its §2 Typography character (you only read the catalog index there; the per-style character lives in the file). Both combinations below MUST realize it, and the title carries the personality — the CJK body may stay a neutral pre-installed sans, but the title leads with the character the style asks for (e.g. ink-wash → calligraphic KaiTi / FangSong; brutalist / memphis / vintage-poster / zine → display SimHei / Impact; editorial / data-journalism / photo-editorial → serif Georgia / Cambria / SimSun; dark-tech / blueprint → clean sans + Consolas mono; swiss-minimal / soft-rounded → grotesque / friendly sans). For visual_style: custom, realize its visual_style_behavior character instead. Letting the title default to a neutral sans when the style asks for character is the failure mode to avoid.

Font Combinations

Same-deck fonts must form contrast (different family, weight, or proportion) or concord (one family throughout). "Similar but not identical" pairings across roles are forbidden — see blacklist below. Within one stack, pairing a Windows font with a macOS counterpart (e.g. Microsoft YaHei + PingFang SC) is encouraged as a browser-preview nicety; converter writes only the first into PPTX.

⚠️ PPT-safe font discipline (HARD rule). PPTX has no runtime fallback — missing fonts substitute to Calibri. Every stack MUST end with a pre-installed font:

  • CJK → "Microsoft YaHei" / SimHei / SimSun / FangSong / KaiTi
  • Latin sans → Arial / Calibri / Segoe UI / Verdana / Trebuchet MS
  • Latin serif → "Times New Roman" / Georgia / Cambria / Palatino / Garamond
  • Mono → Consolas / "Courier New"
  • Display → Impact / "Arial Black"

Stacks led by non-pre-installed fonts (Inter / HarmonyOS Sans / Source Han / brand typefaces like McKinsey Bower) are only acceptable when the Design Spec notes "requires install or PPTX embed".

Forbidden — similar-but-not-identical pairings across roles (do not split title vs body across these; within one stack as cross-platform fallback they remain encouraged):

  • Microsoft YaHeiPingFang SCHeiti SC
  • SimSunSongti SCSTSong
  • ArialHelvetica NeueSegoe UI
  • "Times New Roman"Times
  • GeorgiaCambria

Mandatory: propose two combinations to the user — one concord (safe), one contrast (with tension). Do not default to "title = body, same font" without explicit user request. Pick each family by subject fit and the locked visual_style's §2 character (read at the GATE above) — there is no default family; type should follow the deck's content and aesthetic, not fall back to one safe face.

Template precedence: when a template was loaded at Step 3 via an explicit path and declares title / body font stacks in <project_path>/templates/design_spec.md §III Typography / §IV (or whichever heading the fused spec uses), lock those directly and skip the two-combination presentation. Same precedence as e. — user override > template values.

Cross-platform pre-installed reference:

Category Safe families
CJK sans Microsoft YaHei, SimHei, PingFang SC, Heiti SC
CJK serif SimSun, FangSong, KaiTi, Songti SC
Latin sans Arial, Calibri, Segoe UI, Verdana, Trebuchet MS, Helvetica Neue
Latin serif Times New Roman, Georgia, Cambria, Palatino, Garamond, Book Antiqua
Mono Consolas, Courier New
Display Impact, Arial Black

Seed combinations (all PPT-safe; first column is the contrast axis, not a scenario) — starting points, not the allowed set. Any client-preinstalled family is fair game; non-pre-installed expressive faces go title-only (see note below). Let the locked style's §2 character pick the axis and the title lead; the Microsoft YaHei body cells are the neutral default, not a requirement — a styled deck still varies the title even when the CJK body stays a neutral sans.

Contrast axis Title stack Body stack Code stack
Serif × sans Georgia, KaiTi, serif "Microsoft YaHei", "PingFang SC", sans-serif
Kai × hei KaiTi, Georgia, serif "Microsoft YaHei", "PingFang SC", sans-serif
Fangsong × hei FangSong, "Times New Roman", serif SimHei, "Microsoft YaHei", sans-serif
Double serif Palatino, FangSong, serif Cambria, SimSun, serif
Same family, weight contrast (900 / 300) "Microsoft YaHei", "PingFang SC", sans-serif same
Display × neutral Impact, "Arial Black", SimHei, sans-serif Arial, "Microsoft YaHei", sans-serif
Cool serif (academic) Cambria, SimSun, serif "Times New Roman", SimSun, serif
Hei × song (政务) SimHei, "Microsoft YaHei", sans-serif SimSun, serif
Tech / developer Arial, "Microsoft YaHei", sans-serif same Consolas, "Courier New", monospace
Concord (single family — pick the family by subject + visual_style) <family by subject>, …, sans-serif / serif same

Stack length discipline (soft rule). ≤4 fonts per stack. The first CJK and first Latin font MUST be pre-installed — the converter writes only those, and a non-installed lead substitutes to Calibri (drawingml_utils.py parse_font_family). Choose that lead from the safe set by the locked style's character: Microsoft YaHei / Arial are the neutral members — perfect as the tail fallback and as the lead only when the style is plain-sans, but not the automatic lead for every deck. For a styled title, lead with SimHei / KaiTi / FangSong / SimSun / Georgia / Cambria / Impact / Consolas as the character asks. Keep at most one macOS-exclusive family (typically "PingFang SC"); macOS→Windows fallback is auto-mapped via FONT_FALLBACK_WIN.

Non-pre-installed directions (require install or PPTX embed; note the constraint in Design Spec):

  • Retro / pixel — Press Start 2P / VT323 / Silkscreen
  • Rounded friendly — Nunito / Quicksand / M PLUS Rounded / OPPO Sans (closest safe substitute: Trebuchet MS / Verdana)
  • Modern web sans — Inter / HarmonyOS Sans / Source Han Sans / Noto Sans
  • Calligraphic display — 隶书 LiSu / 华文行楷 STXingkai / 华文新魏 STXinwei (closest safe substitute: KaiTi / FangSong); cover / section / hero titles only, never body
  • Brand-specific — McKinsey Bower, corporate VI typefaces

Font Size Ramp (px throughout)

Ramp, not a fixed menu. All sizes derive from the body baseline as a ratio. spec_lock.md typography declares body plus the slots this deck uses (title / subtitle / annotation by default; add cover_title / hero_number / subheading / lead / footnote / chart_annotation as the outline demands).

Mandatory — scan §IX before locking and declare a slot for every role that recurs across pages. Do not ship only the four defaults when the outline plainly carries more. A report / text-mode deck almost always recurs a per-page core-message / lead line (the one-sentence key-claim / takeaway under the title) and recurring page numbers / source credits / footnotes — declare lead (in the 1.11.4× lead band, and always ≥ body — the core message is a primary line, never smaller than body) and footnote (keep it readable — ~16px for a standard body, not shrunk smaller) for them. Leaving these undeclared forces the Executor to improvise an unlocked size, and a core line improvised below body (with data callouts sitting larger) is exactly the hierarchy inversion this prevents. Recurring chart / figure labels get chart_annotation likewise.

Structural roles (page title / body / subtitle / annotation / footnote) resolve to one size each and hold it deck-wide — that consistency is what reads as professional. Picking an intermediate in-band size is for special / feature elements (hero number, display title, one-off emphasis); a recurring one is declared as its own slot so it stays consistent too.

Unit boundary (HARD rule). The system is px-only. recommendations.json, the Confirm UI, result.json, design_spec.md, spec_lock.md, and SVG all carry unitless px — there is no pt layer anywhere, and no pt→px conversion step. pt exists only as the size PowerPoint happens to show after export (px × 0.75, rounded to 1 decimal); it is never an input, a confirmation value, or a provenance field (no body_size_pt / sizes_pt). Never write pt / px / em units; every layer carries bare px numbers. Geometry — margins, gaps, card sizes — is px too. (Beautify reads a source deck's pt at intake but converts to px before any recommendation — see beautify-pptx.md; pt never re-enters the contract.)

Baseline — one fixed value per delivery purpose, not a range. Delivery purpose is a Tier-1 anchor confirmed in §c key information (beside audience, not as a separate typography step — see §1 Two-tier confirmation); it is the primary driver because the same canvas reads very differently when read close vs. projected. It is a deck-wide axis — beyond the body baseline it also drives page density / count / rhythm; see §6.1 for that side. Here, with the purpose confirmed, it sets the body baseline to a single value:

Delivery purpose (PPT 16:9) Body (px) Reads as
text · read-close (report, data-dense brief, leave-behind file) 20px screen / handout reading at arm's length
balanced · business (presented and read; roadshow, review) — default 24px mixed projection + reading
presentation (projected, sparse; keynote, launch, classroom) 32px room projection, glance from the back

The body baseline is purely a function of delivery purpose — density and visual style do not nudge it within a range. Body size is the reading-distance proxy; density is orthogonal and shows in how much text per page, page count, and page_rhythm (§6.1), the other roles, and decoration — never in growing or shrinking the body baseline. One purpose → one body size, identical across the deck. (The user may still override the value in confirmation; absent an override, this fixed value is the recommendation.)

Canvas Height Body baseline Unit
PPT 16:9 / 4:3 720 / 768 20 / 24 / 32 (by delivery purpose) px
Xiaohongshu 1242×1660 4055 px
WeChat / IG 1:1 1080×1080 2736 px
Story / Portrait 1080×1920 4864 px
A4 1240×1754 4458 px

Every canvas authors px directly. PPT and social / print alike — the body baseline is a px number (PPT by delivery purpose above; social / print by the per-canvas value). No pt confirmation, no conversion step on any canvas.

Confirmed values win — never recompute over them. The user's confirmed sizes are authoritative. Confirm UI path: take result.json typography.body_size / sizes (already px) verbatim — do not re-derive from the canvas even if the user changed it. The page deliberately does not auto-rescale font sizes when the canvas changes (it only updates the recommended-value hint), so result.json already reflects the user's intent; recomputing here would silently override their choice. Chat-fallback path (no result.json): take the px body baseline for the confirmed canvas + delivery purpose directly from the table above (no conversion). The body_size in recommendations.json is only a stale hint once the canvas changes — use the confirmed value, not the recommendation.

Level Ratio to body 32px baseline (presentation) 24px baseline (balanced)
Cover title (hero headline) 2.5-5x 80-160px 60-120px
Chapter / section opener 2-2.5x 64-80px 48-60px
Page title 1.5-2x 48-64px 36-48px
Hero number (consulting KPIs) 1.5-2x 48-64px 36-48px
Subtitle 1.2-1.5x 38-48px 29-36px
Lead-in / intro 1.1-1.4x 35-45px 26-34px
Subheading 1.1-1.3x 35-42px 26-31px
Body 1x 32px 24px
Annotation / caption 0.7-0.85x 22-27px 17-20px
Page number / footnote 0.5-0.65x 16-21px 12-16px

Two baseline columns are illustrative only — for any other body px value (20 / 24 / 32 / ...), multiply the row's ratio. Structural roles (page title / body / subtitle / annotation / footnote) take their locked slot value and stay there on every page — not a per-page pick. In-band freedom without pre-declaring is for special / feature elements (hero number, display title, one-off emphasis); a recurring special size should be declared as its own slot. The subtitle / lead / subheading bands overlap on purpose — pick by role, not size, then hold each at one size deck-wide. Values outside every band require lock extension first.

Round recommended sizes to clean even px — don't ship ratio leftovers. The ratios are a guide; lock each role at a clean even px, not the raw product. For body 24px that means title 42 · subtitle 32 · lead 30 · annotation 18 · footnote 16 — never 32.4 / 18.7 / odd tails, which read as unprofessional. Snap the ratio output to the nearest even px (…14, 16, 18, 20, 24, 28, 32, 36, 42, 48…), then lock that. (The Confirm UI already snaps its per-role suggestions this way; match it on the chat-fallback path.)

px is literal — write the locked number verbatim (Mandatory). result.json / design_spec.md / spec_lock.md / SVG all carry px as-is; there is no conversion anywhere. The size you confirm is the size you write. The Executor's font-size MUST be the exact px from spec_lock.typography — if body is 24, write 24; never a "rounder" or PowerPoint-familiar number (20 / 18 / 36). Writing a remembered pt-style value as px is the silent drift that renders a whole deck the wrong size (e.g. a 24px body emitted as 20 ships ~17% small); the checker's spec-lock drift guard backstops it, but author it right. Per role: honor any size the user pinned as that slot's locked value; derive the rest from the ramp and snap to clean even px. (At export the px is turned back into pt by × 0.75, rounded to 1 decimal — that is the only place pt ever appears, and it is automatic.)

Hero in single-focus / breathing pages: when one element is the entire page — a large number, a headline, a key phrase — it is the visual subject, not body content. Such heroes may borrow the cover-title band (2.55×); for greater emphasis, declare a hero slot in spec_lock.md (e.g., hero_number / hero_headline) — checker exempts declared slots with no fixed upper limit. The row above "Hero number (consulting KPIs) 1.52×" applies only to numeric KPIs in dashboard/data layouts, not to full-page focal elements.

Formula Rendering Policy

Formula rendering is part of Typography confirmation. Recommend one policy and let the user confirm or override it inside item g.

Policy Behavior Use
mixed (default) Render complex formula-worthy expressions to PNG; keep simple inline math as editable text / Unicode Most academic, engineering, educational, and technical decks
render-all Render every formula-worthy expression to PNG Formula-heavy teaching / research decks where visual consistency matters more than editability
text-only Do not render formulas; keep expressions as editable text / Unicode Business decks, light technical briefs, or user preference for editability

Hard rule: $...$ / $$...$$ in source material are input signals only. Do not scan output files for dollar-delimited formulas. After confirmation, Strategist decides which source expressions become formula assets and writes them explicitly to images/formula_manifest.json.

Formula-worthy expressions:

Render as PNG Keep as text
Fractions, radicals, integrals, sums, limits, matrices, multiline derivations, complex super/subscripts O(n log n), x = 3, single Greek letters, short inline variables, simple percentages / KPIs

Forbidden — invented math: formula assets must faithfully structure source content. Do not create a new equation just to make a slide look more academic.

Manifest step: After the Eight Confirmations and before writing design_spec.md, if policy is mixed or render-all and formulas are selected:

mkdir -p <project_path>/images
python3 skills/ppt/scripts/latex_render.py <project_path>

Write the manifest first at <project_path>/images/formula_manifest.json. Use this shape:

{
  "providers": ["codecogs", "quicklatex", "mathpad", "wikimedia"],
  "items": [
    {
      "id": "formula_001",
      "latex": "E = mc^2",
      "display": "block",
      "color": "#1D1D1F",
      "background": "#FFFFFF",
      "transparent": true,
      "dpi": 400,
      "filename": "formula_001.png"
    }
  ]
}

The script renders PNGs into images/, trying codecogs, quicklatex, mathpad, then wikimedia unless the manifest overrides providers. codecogs, quicklatex, and mathpad preserve requested formula color; wikimedia is an availability fallback and may require visual checking on dark themes. Formula PNGs are transparent by default: use background as the temporary render matte and local background-removal reference. Set transparent: false only when the final formula must keep an opaque background. It writes pixel_width, pixel_height, ratio, file, provider, and status back into the manifest. Run analyze_images.py <project_path>/images after formula rendering so the formula PNGs are included in the same inventory pass as user images.

h. Image Usage Confirmation

Option Approach Suitable Scenarios
A No images Data reports, process documentation
B User-provided Has existing image assets
C AI-generated Custom illustrations, backgrounds needed
D Web-sourced Real-world reference imagery, editorial support, stock-style needs (no API key required for default providers)
E Placeholders Images to be added later

Default toward imagery on framing pages — don't silently default the whole deck to A. When the user provided no images, do not quietly pick A "no images" deck-wide. Proactively propose AI imagery (C) on the pages where it lifts the deck most — cover, section dividers, single-concept / breathing pages, atmosphere / backdrop moments — and surface it as one of the image-strategy candidates the user picks from. Keep data / list / process / comparison pages chart- and diagram-driven (route those to §VII visualizations, not §VIII imagery) — imagery is for framing and concept, not for decorating dense content. A "no images" is the right call only when the deck is genuinely all-data / operational and imagery would add nothing — then say so explicitly rather than defaulting there by omission. Everything stays gated: the user confirms image usage here, and AI generation runs through imagegen's own cost-confirmation gate (SKILL.md Step 4 / 阶段二) — proposing costs nothing, only confirmed generation spends.

🚧 GATE — know your resources before recommending. images/ is a live working folder (source-extracted pictures, user drops, later replacements), so its facts are re-derived on use, never trusted from a stale store. Before recommending image usage, if images/ is non-empty, regenerate the inventory from whatever is currently in it, then read it back:

python3 scripts/analyze_images.py <project_path>/images

Read <project_path>/analysis/image_analysis.csv (size / ratio / category of every in-hand picture). The AE choice is still your judgment, but it MUST be made with the current inventory in full view — never in ignorance of what is already on hand. image_analysis.csv is a regenerated view of the live folder, not a durable fact: re-run this whenever images/ changes.

Confirmed value wins. The image_usage in result.json (or the chat reply) overrides the recommendation here — map it to §VIII Acquire Via (aiai, webweb, provided→**user**, placeholderplaceholder, none→option A, no image rows). When it is not ai (and the plan has no AI part), skip h.5 entirely and write no ai rows. See SKILL.md Step 4 for the full mapping.

When recommending C — surface its three implementation modes so the user knows "no API key" is a supported state:

Mode Trigger Mechanism
Path A IMAGE_BACKEND configured (default) image_gen.py runs in Step 5
Path B IMAGE_BACKEND not configured AND host has a native image tool (Codex / Antigravity / Claude Code / similar) — auto-selected, no user prompting needed Host-native generation
Offline Manual IMAGE_BACKEND not configured AND host has no native image tool Prompts written to images/image_prompts.json; user generates externally and places files in project/images/

Selection is automatic in Step 5 (A → B → Manual). Detailed contract: image-generator.md §3.2.

Selections may be mixed at the row level — e.g. a deck can use C for hero illustrations while sourcing D for supporting team photos.

h.5 AI Image Strategy — lock rendering + palette (only when C is selected)

When the deck includes any ai rows, Strategist locks a deck-wide rendering and deck-wide palette here. These two values are written into design_spec.md §III and spec_lock.md colors / images sections, then consumed by Image_Generator. Every AI image in the deck shares them — this is what makes multiple AI images feel like one deck.

🚧 GATE — before recommending values: read_file references/image-renderings/_index.md and read_file references/image-palettes/_index.md. They contain the catalog, auto-selection tables, and a rendering × palette compatibility matrix.

Three-candidate presentation (default path)

Hard rule: Unless the user has already named a specific rendering or palette (chat or template), present ≥3 distinct rendering × palette combinations and let the user pick. Never auto-lock a single combination silently.

Per-candidate schema (exactly 4 lines, no extras):

[Plan A] <temperament label> — <rendering> × <palette>
  Visual: <shape / line / material / light, 1-2 phrases>
  Color: <secondary HEX (ratio) + primary HEX (ratio) + accent HEX (ratio); HEX values from e.>
  Mood: <2-3 traits>; like <real-world analogy: company / publication / event>

After the candidates, append one line:

> Reference images: see references/ai-image-comparison/ for matching PNGs by name.

Hard rules for candidate construction:

Rule Behavior
Filter by e.'s HEX Only include palettes whose temperament can carry the user's HEX. Vivid red → exclude cool-corporate / mono-ink; include vivid-launch / warm-earth / editorial-classic.
HEX values in Color line MUST be e.'s real values Palette contributes only the 60-30-10 ratio + role assignment. Never substitute the palette's typical HEX.
Span a personality spectrum Typically: one conservative-default (industry norm), one shifted-tone (same fit, 1-2 ticks different), one bold-contrast (more expressive, may challenge default). No near-duplicates.
Mood line MUST include a real-world analogy Company / publication / event the user can picture. Adjective stacks alone are forbidden.
Adapt labels to chat language Schema is English by default. Chinese chat → render as 「方案 A / 视觉 / 色彩 / 情绪」. Structure stays the same; only the labels translate.
Skip presentation when user has specified User-named rendering or palette (chat / brand / template), or a Confirm UI pick in result.json.image_strategy (same shape as color / typography honoring their confirmed candidate), bypasses the candidate flow — lock that chosen candidate's rendering + palette directly per the truth-precedence rule; do not re-pick.
custom is a tail-case, not a default When no preset fits, a candidate may set rendering: custom and / or palette: custom (rules: image-renderings/_index.md §1.5, image-palettes/_index.md §2). At most one candidate per dimension may carry custom; one candidate may carry both dimensions as custom. Visual / Color lines describe the behavior in prose, never by naming a competing preset.

Forbidden — padding with conflicts: if e.'s HEX cannot find ≥3 compatible palettes, present the smaller set (2 candidates) and state "your color is unusual — only N palettes can carry it without conflict." A custom candidate is allowed only when its prose genuinely describes a tail-case the presets cannot — not as a slot-filler. Never fill remaining slots with known-conflicting options.

Worked example (e. = #1E3A5F navy + #F8F9FA off-white + #D4AF37 gold; d. = consulting; chat in English):

[Plan A] Restrained Professional — vector-illustration × cool-corporate
  Visual: flat vector, solid color blocks, no gradients or shadows
  Color: off-white #F8F9FA (60-70%) + deep navy #1E3A5F main (25-30%) + gold #D4AF37 accent (<5%)
  Mood: steady, trustworthy, restrained gravitas; like a McKinsey consulting report

[Plan B] Editorial Depth — editorial × editorial-classic
  Visual: magazine layout, 8% paper texture, column-based partitioning
  Color: off-white #F8F9FA paper (55%) + deep navy #1E3A5F column (30%) + gold #D4AF37 rule line (10-14%)
  Mood: refined, considered, paced; like an Economist feature spread

[Plan C] Future Energy — 3d-isometric × tech-neon
  Visual: isometric 3D, soft shading, 8% glow halos around bright elements
  Color: off-white #F8F9FA digital field (50%) + deep navy #1E3A5F main (35%) + gold #D4AF37 emphasis (10-15%)
  Mood: forward, energetic, futuristic; like an Apple or Stripe product keynote

> Reference images: see references/ai-image-comparison/ for matching PNGs by name.

Worked example — custom × custom (tail-case; e.g. 新中式 deck with #1A1A1A + #F5EFE0 + #A52A2A):

[Plan A] 文人雅致 — custom × custom
  Visual: dry-brush burnt-ink, five tonal gradations, 宣纸 paper-grain, deliberate negative space; 朱泥 seal as single red mark
  Color: cream #F5EFE0 ~65% negative space + burnt-ink #1A1A1A ~20% strokes + cinnabar #A52A2A 3-5% seal
  Mood: literati restraint; like 苏州博物馆 pacing

Visual / Color lines feed spec_lock.md's image_*_behavior rows verbatim.

After the user picks a candidate (or supplies a custom variant), proceed to "Recording the lock" below.

Prompt depth for §VIII rows

Hard rule: When §VIII contains paper-figure or subject-domain rows (scientific subjects, specialized fields, regulated content), each row's generation description follows image-generator.md §4.2 Prompt depth — expand to the depth the subject demands; 500-1000+ words is normal.

Forbidden — generic shortening: never drop a paper-figure row's prompt to a 50-word generic illustration brief.


Catalog reference (for candidate construction)

The tables below are source data Strategist reads when constructing the three candidates above. They are no longer the final output by themselves.

Rendering recommendation (soft — user may override with any other rendering from the catalog):

d. Style signal Recommended rendering Alternates
Strategic / MBB / board editorial or vector-illustration blueprint, minimalist-swiss
Corporate report / analysis / 学术答辩 vector-illustration flat, editorial
High-end consulting / luxury / 高端 / design-firm minimalist-swiss editorial, vector-illustration
Tech / SaaS / AI / 架构 3d-isometric, blueprint, digital-dashboard flat
Modern SaaS / fintech / health-tech / premium app glassmorphism digital-dashboard, flat
Product launch / brand / marketing flat, 3d-isometric, corporate-photo vector-illustration
Education / training / 教学 / 培训 sketch-notes vector-illustration, paper-cut
Children / storybook / 儿童 / 治愈 fantasy-animation paper-cut, watercolor, sketch-notes
Cultural / folk / festival / 文化 / 节日 paper-cut vintage-poster, screen-print
Methodology / Before-After / 方法论 / manifesto ink-notes editorial
Government / formal / 政务 editorial or corporate-photo vector-illustration
Finance / journalism / 财经 editorial, digital-dashboard vector-illustration
Personal story / 个人成长 / lifestyle watercolor, warm-scene corporate-photo, paper-cut
Cultural / media / opinion / cinematic screen-print, vintage-poster editorial, warm-scene
Brand heritage / hospitality / 老字号 / 周年 vintage-poster screen-print, editorial
Gaming / retro / 复古 / 像素 pixel-art vintage-poster
Environment / wellness / 环保 nature watercolor, paper-cut
Classroom / blackboard / 课堂 chalkboard sketch-notes
Team / company / product photo corporate-photo

Palette recommendation (soft — user may override):

Content vibe / industry Recommended palette Alternates
Consulting / finance / B2B / corporate / 学术答辩 cool-corporate editorial-classic, frost-ice
Tech / SaaS / AI tech-neon cool-corporate, dark-cinematic
Modern SaaS / fintech / health-tech frost-ice cool-corporate, tech-neon
Health / medical / beauty / skincare frost-ice nature-organic, earthy-dusty
Education / training macaron warm-earth
Methodology / Before-After mono-ink editorial-classic
Personal / lifestyle / brand story warm-earth nature-organic, earthy-dusty
Interior / wellness / mindfulness / slow living earthy-dusty warm-earth, nature-organic
Product launch / marketing vivid-launch tech-neon, sunset-gradient
Creative agency / travel / music / lifestyle sunset-gradient vivid-launch, warm-earth
Luxury / fashion / jewelry / premium / heritage jewel-tone dark-cinematic, editorial-classic
Children / storybook macaron warm-earth
Premium / film / entertainment dark-cinematic jewel-tone, duotone
Cultural / media / cover-art duotone editorial-classic
Environment / wellness nature-organic warm-earth, earthy-dusty
Finance / journalism editorial-classic cool-corporate

After auto-selecting, cross-check image-palettes/_index.md compatibility matrix — if rendering × palette is , swap to the alternate palette.

d-e-f-g linkage sanity check (do this after picking rendering + palette):

Linkage What to verify
d. Style ↔ rendering Rendering family should match the Style descriptor's temperament (corporate ≠ sketch-notes; tech ≠ watercolor). Already enforced by the recommendation table above.
e. Color HEX ↔ palette HEX is truth — palette is just the "how to use these HEX" rulebook for AI images (saturation / contrast / 60-30-10 / material). Mismatch → always swap palette to fit the HEX, never adjust the HEX to fit a palette. E.g. user gives a vivid red but you auto-picked cool-corporate — switch to vivid-launch or warm-earth, do not propose dimming the red.
f. Icon library ↔ rendering tabler-outline pairs well with all renderings (most versatile). chunk-filled / tabler-filled pair better with vector-illustration / flat / editorial. phosphor-duotone pairs with flat / digital-dashboard. Mismatch is not fatal but worth flagging.
g. Typography ↔ rendering Serif title → pairs well with editorial, corporate-photo, screen-print. Hand-lettered direction → already implied by sketch-notes / ink-notes (the rendering carries the lettering, no separate font requirement). Display font → vivid-launch / screen-print. Mismatch is rarely fatal; note in conversation if it feels off.

Recording the lock — after picking, write to:

  • design_spec.md §III Visual Theme — add lines under the color table:
    - **Image Rendering**: vector-illustration
    - **Image Palette**: cool-corporate
    
  • spec_lock.md colors section — add rows at the bottom:
    - image_rendering: vector-illustration
    - image_palette: cool-corporate
    

Hard rule — custom recording: when the picked candidate has rendering: custom or palette: custom, also write the sibling *_behavior row. Source: the candidate's Visual line (for rendering) / Color line (for palette), expanded to cover the prose requirements in image-renderings/_index.md §1.5 / image-palettes/_index.md §2 (chat candidates are compressed; spec_lock prose covers all axes). Both design_spec.md and spec_lock.md must carry the behavior line. Example for the custom × custom candidate above:

- image_rendering: custom
- image_rendering_behavior: "Dry-brush burnt-ink with five tonal gradations, 宣纸 paper-grain at 12% opacity, deliberate negative space; 朱泥 seal as a single red mark; no Western outlines, no gradients."
- image_palette: custom
- image_palette_behavior: "宣纸 cream `#F5EFE0` carries ~65% as negative space; burnt-ink `#1A1A1A` anchors ~20% as brush strokes; cinnabar `#A52A2A` only in 3-5% as seal. Literati restraint — no fourth color."

Image_Generator reads these fields and applies them deck-wide. If both are absent (legacy decks), it falls back to inferring from d. Style and e. Color — quality is acceptable but not optimal. Always lock both when C is selected.

hero_page suggestion (same confirmation turn)

After the user picks a candidate, scan the outline and surface any pages where the image makes more sense as the page's main voice than as a local block. Present them as a short list and let the user confirm, edit, or skip. Result is recorded as page_role: hero_page on the matching ai rows. Density is judgment-based — no fixed quota.

Per hero_page title: lock where it lives — embedded (fused into the image: neon, carved, smoke, 3D-lit lettering) or none (editable SVG title over an atmospheric backdrop, Primitive D). Default none; flip to embedded only when the words must be part of the visual, not merely a display font. Per page — may bake only the keyword while subtitle / date / chrome stay SVG. Surface it with the hero_page list for the same confirm / edit / skip.

When selection includes B, you must run python3 scripts/analyze_images.py <project_path>/images before outputting the spec, and integrate scan results into the image resource list.

When B / C / D / E is selected, add an image resource list to the spec:

Column Description
Filename e.g., cover_bg.png
Dimensions e.g., 1280x720
Ratio e.g., 1.78
Layout suggestion e.g., Wide landscape (suitable for full-screen/illustration)
Layout pattern MANDATORY — one or more #<id> <name> joined by + from image-layout-patterns.md. Combine a Primary id with optional Modifier ids when the page needs it (e.g. #48 side-by-side comparison + #21 rounded rectangle crop + #29 two-stop scrim). A single Primary is fine when the page calls for it. See the GATE earlier in this section. Empty cells or invented ids are invalid.
Purpose e.g., Cover background
Type Free-form category tag — Background, Photography, Illustration, Diagram, Portrait, Latex Formula, etc. Required for formula rows (Latex Formula).
Acquire Via ai / web / user / formula / placeholder — only ai and web drive Step 5 dispatch
Status Initial status must be Pending, Existing, Rendered, or Placeholder; see svg-image-embedding.md for the full status enum
Reference Free-form intent description (NOT a search query); feeds Image_Generator (ai) or Image_Searcher (web)
text_policy (optional, ai rows only) none (no text in image) or embedded (text is part of the artwork). Leave blank when Image_Generator should decide per row. Long body / data / lists stay in SVG.
page_role (optional, ai rows only) local (image is a region block on an SVG page) or hero_page (image is the page's main voice). Leave blank when Image_Generator should decide per row.

No-crop flag (exception only): most images are croppable — Executor defaults to preserveAspectRatio="xMidYMid slice". When an image must NOT lose pixels (data screenshots, charts, certificates, contracts, dense diagrams), append no-crop to its spec_lock.md images entry. Executor will then size the container to the native ratio and use meet. Don't tag the rest.

Formula rows: rendered LaTeX PNGs are image rows with Acquire Via: formula, Status: Rendered, and Type: Latex Formula. Always append no-crop in spec_lock.md images. They are not AI images and never go through Step 5.

Reference field: Write visual intent, not provider mechanics.

Intent description Avoid
"Diverse engineering team collaborating around a laptop, modern office, natural light" "team laptop office"
"Abstract atmospheric backdrop for academic-defense cover, calm center for text overlay, hint of campus skyline" "use openverse, search 'office'"
"Sunlit forest path in autumn" "team photo"

Per-row Reference grammar:

Acquire Via Reference pattern
ai Subject + intent + composition only. Do NOT repeat style words ("flat design", "modern", "vector") or HEX values — both are already locked deck-wide by h.5 (rendering + palette) and design_spec §III (colors). Image_Generator's prompt assembler injects them automatically.
web Concrete subject/place/object first, then 1-3 quality descriptors
formula Original LaTeX plus short placement intent, e.g. formula_001: block energy-mass equation for P03

Allowed web quality descriptors:

Descriptor Use
professional editorial photography Stock-style photography
clean composition Covers, section dividers, image-text layouts
natural light People, workplace, travel, lifestyle scenes
high-resolution Large visual areas

Forbidden — web negative prompts: not tourist snapshot, no phone photo, avoid amateur style.

Mode Good Reference
web "Diverse team collaborating at a modern office desk, professional editorial photography, natural light, laptop visible"
ai "Atmospheric backdrop suggesting digital innovation; calm central area reserved for slide title overlay; light geometric anchor at one edge"
ai "Four-stage value chain from raw input to R&D output; icons should suggest tax-form → cost-reduction → equipment-upgrade → innovation; no text labels (SVG overlays them)"

🚧 GATE — before writing §VIII Image Resource List: when image approach is B/C/D/E (anything other than A "no images"), this is a three-layer hard requirement, not a suggestion:

  1. Readread_file references/image-layout-patterns.md. The file enumerates 72 numbered techniques split into Part 1 — Primary Structures (#1#19 container layouts, #38#46 image-as-canvas + native overlay, #47#56 multi-image) and Part 2 — Modifier Layers (#20#26 non-rectangular crops, #27#37 overlays & masks, #57#61 texture, #62#72 special). The four Image narrative intent values below cover only broad categories.
  2. Produce — every non-formula row in §VIII Image Resource List MUST fill the Layout pattern column with one or more #<id> <name> joined by + drawn verbatim from this file (Primary + optional Modifiers). Rows with empty cells, paraphrased names, or invented ids are invalid. Formula rows are the only exception; use formula-inline or formula-block.
  3. Image-as-canvas coverage — for any deck with ≥4 image-bearing pages, at least one page MUST use a #38#46 pattern (image-as-canvas + native overlay) unless every image is a pure cover / chapter divider / atmosphere backdrop. This family is the most-skipped one and is usually the right answer for content-rich pages with photographs. If the deck legitimately has no opportunity for it, state the reason in §VIII directly under the table.

Skip-detection signal for self-audit: if you notice that every page's Layout pattern column resolves to #2/#3 (left-third or right-third), #5/#6 (top-bottom band), or generic side-by-side, you have not actually consulted the file — re-read and reconsider. The default left/right and top/bottom split bias is the failure mode this gate exists to break.

Skip-detection signal — text_policy column: if every ai row resolves to none and the deck contains any paper-figure / academic schematic / panel-comparison / data-axis page, you defaulted instead of judging per row. Consult image-generator.md §5.3 positive-trigger table and re-decide each row. none for every row is correct only when no row matches a trigger; otherwise this is the same class of failure as the layout-pattern signal above.

Image narrative intent (decide before the ratio table — determines whether the image lives in a container at all):

Intent Form When to use
Hero / full-bleed Image fills canvas/dominant zone; title floats over with gradient or opacity overlay Covers, chapter dividers, breathing pages — image is the message
Atmosphere / background Image as low-contrast backdrop (reduced opacity or dark overlay); text reads on top Section backgrounds, mood-setting — image sets tone, text carries info
Side-by-side Image and text as adjacent coequal blocks — ratio table below governs container sizing Most content pages — image and text read together
Accent / inline Small image beside related text, not a container; no ratio matching Supporting visuals, spot illustrations

Intent follows narrative purpose, not image ratio. Don't default every image page to side-by-side.

Side-by-side ratio alignment (consult only when the chosen intent is side-by-side; detailed calculation rules in references/image-layout-spec.md):

Image Ratio Recommended Container Layout
> 2.0 (ultra-wide) Top-bottom split, top full-width
1.5-2.0 (wide) Top-bottom split
1.2-1.5 (standard landscape) Left-right split
0.8-1.2 (square) Left-right split
< 0.8 (portrait) Left-right split, image on left

Side-by-side only: container ratio must match image ratio. Hero / atmosphere / accent intents ignore ratio alignment.

Portrait canvases (Xiaohongshu, Story): Layout rules differ — top-bottom is preferred for most ratios since left-right columns become too narrow. See "Portrait Canvas Override" in references/image-layout-spec.md.

Multi-image slides: When multiple images appear on one page, use the grid formulas in the "Multi-Image Layout" section of references/image-layout-spec.md.

Pipeline handoff: When C) AI generation is selected, Image_Generator consumes Pending rows and updates them to Generated or Needs-Manual before Executor proceeds. Status names are defined in svg-image-embedding.md.

Template Match — Visualization + Structural Patterns (Non-blocking — Strategist recommends, no user confirmation needed)

The catalog covers both data charts and structural information designs. A "match" is not limited to numeric pages — any page whose content shape matches a Pick for ... clause is a candidate:

  • Data-type pages: comparisons, trends, proportions, KPIs, financials, rankings, distributions, conversion funnels
  • Structural-type pages: team rosters, agendas, principles & values, methodology phases, customer journey, capability maps, OKR cascades, roadmaps, strategic frameworks (SWOT / BCG / PEST / Porter's Five Forces / Value Chain — matched via quadrant_text_bullets, quadrant_bubble_scatter, vertical_pillars, hub_inward_arrows, chevron_chain_with_tail respectively)

The most common Strategist failure mode is missing the structural half — treating "chart" as "numeric chart only" and leaving team / agenda / principles / journey pages as text-only when a template would fit. Read the catalog with both lenses.

🚧 GATE — visual floor (the deck must not ship as text-on-rectangles). "Not every page needs a chart" is true, but an informative multi-page deck that assigns zero visual treatments is the #1 quality failure: flat walls of text in boxes, no diagrams / figures / charts / imagery. Hard floor for any content-bearing deck (≥6 pages, not a pure title/quote teaser):

  • Every content page whose shape matches a catalog Pick clause MUST get a visual treatment — a page_charts entry (chart / infographic template), a page_layouts structural template, or an explicit §VII custom-diagram plan (no-template-match with the figure described). Text-in-boxes is the fallback only for pages that genuinely carry no structurable shape, and you must be able to name why.
  • spec_lock.md MUST NOT ship a content deck with page_charts empty AND page_layouts empty/free-design AND no §VII custom-diagram rows. That combination means no visual was assigned anywhere — re-scan §IX and map each content page's shape to a figure (comparison→columns/quadrant, process/历程→timeline/process_flow, cycle/循环→concentric/segmented_wheel, share→donut/pie, trend→line/area, ranking→bar, architecture→layered_architecture, relations→hub_spoke/mind_map).
  • Downstream enforcement: svg_quality_checker.py hard-fails any deck (≥6 text-heavy pages) whose Executor output has zero <path>/<polygon>/<polyline>/<image> deck-wide. Leaving the visual plan empty here guarantees that failure later — assign the figures now, at the spec stage, where it is cheapest to change.
  • Archetype cap (anti-monotony): the same layout archetype — especially the icon+title+text card grid and the full-width stacked row list — may carry at most 2 pages per deck, adjacent or not. Five "2×3 icon grid" pages read as the same slide repeated even when the copy differs (real shipped failure). When §IX assigns a third page the same shape, rework it into a different visual form via the mapping above. Downstream: the checker warns at 3 same-fingerprint pages and errors at 4.
  • Title semantics must be honored by the layout: a page titled 架构/分层 gets stacked layers (NOT equal-width row bars); 矩阵 gets a real 2×2 quadrant (NOT a card grid); 流程 gets a directional flow; 层级 gets a pyramid/tree. Shipping a "五层架构" page drawn as five identical list rows is the canonical semantic mismatch — the title promises a figure the page never draws.

Reading is mandatory; the catalog is a starting point, not a copy target.

  • Fully read templates/charts/charts_index.json before drafting the Eight Confirmations — the read happens up front, not when you sit down to write Section VII. The file contains meta + charts.<key>.summary only; each summary is a selection rule ("Pick for … Skip if …"), not a description. There is no category, quickLookup, or keyword index — selection is done by semantically matching each page's content shape against all 71 summaries in one pass.
  • Not every page needs a chart. When a page's information structure matches a catalog entry, use that template as a structural starting point — keep the visualization type and core layout logic, then adapt composition, density, color, decoration, and accompanying elements to fit this deck's content and visual tone. Free adjustment is encouraged; what is forbidden is (a) generating without reading the catalog, and (b) blind verbatim mimicry that ignores the page's actual content weight.

Workflow:

  1. Read all 71 summaries; for each page, identify the Pick clause that matches the page's content shape AND does not match any Skip clause.
  2. Prefer specificity (vertical_list over generic numbered_steps).
  3. One primary visualization per page; a supporting layout may accompany it.
  4. List selections in Design Spec section VII; section IX only notes the visualization type name per page.

Source vocabulary mismatch — the catalog is in English. When source content uses Chinese / industry jargon ("中台", "架构图", "述职", "管道", "前后端"), translate the intent first, then match against summaries. The catalog deliberately keeps no keyword index — full-read forces semantic matching rather than lexical grep.

Read-audit (mandatory, section VII format) — single combined table; summary-quote column is the anti-fabrication audit, path + usage serve Executor lookup. Format defined in templates/design_spec_reference.md §VII:

Catalog read: 71 templates

| Page | Template      | Path                              | Summary-quote (verbatim) | Usage |
| ---- | ------------- | --------------------------------- | ------------------------ | ----- |
| P03  | bar_chart     | templates/charts/bar_chart.svg    | "<verbatim first sentence>" | <intent> |
| P07  | line_chart    | templates/charts/line_chart.svg   | "<verbatim first sentence>" | <intent> |
| P11  | pie_chart     | templates/charts/pie_chart.svg    | "<verbatim first sentence>" | <intent> |

Runners-up considered (3 entries minimum, drawn from real second-best matches):
- <key_A> | rejected for P03: <reason citing this deck's specifics>
- <key_B> | rejected for P07: <reason>
- <key_C> | rejected for P11: <reason>

The summary-quote must be copy-pasted from charts_index.json — paraphrasing or summarizing breaks the audit. Every template name listed (selected or rejected) must grep cleanly inside charts_index.json (so misspelled or invented keys fail). If fewer than 3 visualization pages exist, list what exists and note "fewer than 3 viz pages"; runners-up still required for each page that does exist.

Fallback when no template fits:

  1. Re-read the full summary list with the page's intent re-stated in plain language — "non-obvious" matches often surface on the second pass (e.g. "causal chain" → process_flow or sankey_chart).
  2. If still no fit: data-driven content → table layout; conceptual/illustrative → "AI-generated image" (Image_Generator handles); structural → "custom layout".
  3. Mark the page no-template-match in section VII with the fallback chosen and why. Do NOT silently substitute a close-but-wrong chart.

Speaker Notes Requirements (Default — no discussion needed)

  • File naming: Recommended to match SVG names (01_cover.svgnotes/01_cover.md), also compatible with notes/slide01.md
  • Fill in the Design Spec: total presentation duration, notes style (formal / conversational / interactive), presentation purpose (inform / persuade / inspire / instruct / report)
  • Split note files must NOT contain # heading lines (notes/total.md master document MUST use # heading lines)

2. Mode & Visual-Style Catalogs (Reference for Confirmation Item d)

Confirmation d locks two independent catalog items:

Read the relevant _index.md at confirmation d (Layer 1 / Layer 2) for its catalog table and auto-selection. Executor loads the locked mode + visual-style files at generation (see SKILL Step 6).


3. Color Knowledge Base

Consulting Brand Colors

Brand HEX
Deloitte Blue #0076A8
McKinsey Blue #005587
BCG Dark Blue #003F6C
PwC Orange #D04A02
EY Yellow #FFE600

Versatile / General Colors

Style HEX
Tech Blue #2196F3
Vibrant Orange #FF9800
Growth Green #4CAF50
Professional Purple #9C27B0
Alert Red #F44336

Data Visualization Colors

  • Positive trend (green): #2E7D32#4CAF50#81C784
  • Warning trend (yellow): #F57C00#FFA726#FFD54F
  • Negative trend (red): #C62828#EF5350#E57373

4. Layout Pattern Library

Principle — proportion follows information weight, not preset ratios. Combine patterns, break the grid for breathing pages, or propose new patterns. Defaulting every page to symmetric grid produces the "AI-generated" look.

Pattern Suitable Scenarios PPT 16:9 Reference Dimensions
Single column centered Covers, conclusions, key points Content width 800-1000px, horizontally centered
Symmetric split (5:5) Comparisons where two sides carry equal weight Column ratio 1:1, gap 40-60px
Asymmetric split (3:7 / 2:8) One side dominates — chart vs. takeaway, image vs. caption Heavier side 840-1024px, lighter side 256-440px
Three-column Parallel points, process steps Column ratio 1:1:1, gap 30-40px
Four-quadrant / matrix Two-axis classification, strategic quadrants Quadrant 560x250px, gap 20-30px
Top-bottom split Ultra-wide images + text, processes, timelines Image full-width, text area >= 150px height
Z-pattern / waterfall Storytelling, case studies — blocks alternate left/right Guide eye in Z; 3-5 alternating blocks
Center-radiating Core concept + surrounding nodes Center element 200-300px, 4-6 satellite nodes
Full-bleed + floating text breathing / feature pages Image fills 1280x720, text floats over opacity overlay
Figure-text overlap Hero moments — headline over/against image edge Text partially overlaps image, not beside it
Negative-space-driven Single element in 40-60% whitespace One idea, weight through emptiness

PPT 16:9 (1280x720) key dimensions: Safe area 1200x640 (40px margins); Title area 1200x100; Content area 1200x500; Footer area 1200x40.


5. Template Flexibility Principle

Templates are starting points. The Strategist may adjust based on content and audience:

  1. Font size ratios — reference values, adjustable
  2. Color schemes — customize per brand/content
  3. Layout patterns — combine, nest, or break (§4 lists 11 patterns as reference, not exhaustive)
  4. 12-chapter framework — expand or reduce
  5. Spacing / border radius — Executor adjusts per content density and page_rhythm

6. Workflow & Deliverables

6.1 Content Planning Strategy

Content-outline and speaker-notes strategy follow the deck's locked mode — see modes/_index.md and the locked mode's file. The guidance below applies within any mode:

Delivery purpose drives the whole plan, not just type size. result.json delivery_purposetext (read-close) / balanced (business, default) / presentation, confirmed as a Tier-1 anchor (§1) — is a deck-wide consumption mode. It seeds the body baseline (§g) and governs how content is distributed:

Delivery purpose Per-page density & treatment §IX content per page page_rhythm lean
text · read-close dense — pack more per page, fuller layouts prose paragraphs, more blocks, tables / fine detail; complete sentences leans dense
balanced · business (default) balanced one primary + supporting points; moderate text mixed
presentation sparse — one idea per page, generous whitespace keywords / short phrases, a single core message, large visual; never paragraph dumps leans anchor / breathing

This is what makes the axis meaningful: a presentation deck and a text deck built from the same source must differ in per-page text volume, layout density, and rhythm — not only in font size. Page count (item b) stays the user's call; delivery purpose governs the density and treatment within it, and informs the page-count recommendation when the user has not fixed one. Record the chosen purpose in design_spec.md §I. The page_rhythm leans are a bias, not a quota — the filler-page ban and "rhythm follows narrative" rule still hold. (Preservation paths — beautify / template-fill — keep source structure verbatim: honor purpose only in styling, never to re-paginate.)

Per-block expression: phrase each §IX content block in the mode that fits it — prose, bullet, keyword, or any phrasing the content calls for — not a default bullet. Take the cue from the source's texture: a narrative source (article / transcript / talk) leans prose — resist compressing its argument pages into fragments; a data sheet leans bullet/keyword. Write the real sentence into §IX itself, not a skeleton point to expand later. One page mixes modes; let layout pull each (narrative → prose, structural/chart → bullets/keywords).

Note: §IX is the only content copy the Executor re-reads after context compression — what you write there is what survives.

6.2 Outline Output Specification (Must include 11 chapters)

Chapter Content Requirements
I. Project Information Project name, canvas format, page count, style, audience, scenario, delivery purpose, date
II. Canvas Specification Format, dimensions, viewBox, margins, content area
III. Visual Theme Style description, light/dark theme, tone, color scheme (with HEX table), gradient scheme
IV. Typography System Font plan (per-role families — title / body / emphasis / code), font size hierarchy
V. Layout Principles Page structure (header/content/footer zones), layout pattern library (combine/break as content demands), spacing spec
VI. Icon Usage Spec Source description, placeholder syntax, recommended icon list
VII. Visualization Reference List Visualization type, reference template path, used-in pages, purpose
VIII. Image Resource List Filename, dimensions, ratio, purpose, status, generation description
IX. Content Outline Grouped by chapter; each page includes layout, title, core message (the page's one idea), content blocks (in the selected phrasing mode), visualization type (if applicable)
X. Speaker Notes Requirements File naming rules, content structure description
XI. Technical Constraints Reminder SVG generation rules, PPT compatibility rules

Generation steps:

  1. Read reference template: templates/design_spec_reference.md
  2. Generate complete spec from scratch based on analysis
  3. Save to: projects/<project_name>.../design_spec.md
  4. Generate execution lock: read templates/spec_lock_reference.md and produce projects/<project_name>.../spec_lock.md — a distilled, machine-readable short form of the color / typography / icon / image / layout_grid / page_rhythm / page_layouts / page_charts decisions above. This file is what the Executor re-reads before every page (see executor-base.md §2.1). The values in spec_lock.md MUST exactly match the decisions recorded in design_spec.md; if they ever diverge, spec_lock.md wins and design_spec.md should be treated as historical narrative.
    • page_rhythm is mandatory: Based on the page list in §IX Content Outline, assign each page one of anchor / dense / breathing (see spec_lock_reference.md for the full vocabulary). This is what breaks the uniform "every page is a card grid" feel — without it the Executor defaults all pages to dense.
    • Rhythm follows narrative, not quota: breathing pages mark natural pauses — chapter transitions, standalone emphasis (hero quote / big number), SCQA bridges. Dense decks may legitimately be all dense. Do NOT invent filler pages ("Thank you", empty dividers) to pad rhythm — every breathing page must say something independent. Delivery purpose biases the overall lean (presentation toward more anchor / breathing, text toward dense; see §6.1) — a bias, never a quota.
    • Cover impact is mandatory: Page P01 is the deck's first visual contract, not a generic title slide. In design_spec.md §IX, add a Cover impact line for P01 that names one concrete hook and one concrete composition strategy. Use the source's strongest available signal: a provocative core claim, object / scene metaphor, hero number, founder / product / audience moment, or a distilled conflict. Pair it with one concrete composition strategy — such as full-bleed image + floating title, typographic poster, hero object, data hook, editorial scene, high-contrast abstract geometry, or a fresh composition the deck's subject suggests (these are starting points, not the allowed set). If no external or AI image is available, still specify a native-SVG visual hook; do not fall back to "title + subtitle + decorative background". (Beautify / template-fill keep the source cover verbatim — this rule does not apply on those preservation paths.)
    • Cover rhythm lock: P01 remains anchor in spec_lock.md page_rhythm, but its §IX Cover impact must prevent content-page patterns. Do not plan multi-card grids, agenda-like bullets, or equal-weight columns on the cover unless a template explicitly requires that structure, or a preservation path (beautify / template-fill) is transcribing the source cover verbatim.
    • Closing impact (only when the deck closes): the deck's last page is its final visual contract — the strongest impression after the cover. When the deck genuinely lands on a conclusion / call-to-action / final-takeaway page, give it a Closing impact line in §IX: name the one thing the audience should leave with (a distilled takeaway, a forward call, a memorable restatement of the core claim) + one composition that delivers it — never a generic "Thank you" / contact-only slide or a centered-title reprise of the cover. Do NOT invent a closing page to satisfy this — the filler-page ban above still holds; apply it only to the page where the deck actually resolves. Same exemptions as the cover: skip on template / beautify / template-fill preservation paths.
    • page_layouts (write only when a template is in use): For each page that inherits a template SVG, add P<NN>: <svg_basename> (e.g., P04: 03a_content_image_text). Pages designed freely get no entry — Executor reads the absence as "free design, no inheritance". If zero pages use a template, omit the section entirely.
    • page_charts (write only for chart pages that match a catalog template): For each page in design_spec.md §VII whose reference template path points to templates/charts/<name>.svg, add P<NN>: <chart_name>. Pages with no-template-match in §VII MUST NOT appear here (Executor would look for a non-existent reference). If the deck has no data-visualization pages, omit the section.
    • Hard rule: Use both page_layouts and page_charts for the same page only when the layout template is a compatible shell for the chart. Do not pair chart pages with conflicting page layouts (e.g., waterfall_chart + timeline layout, KPI cards + circle-diagram layout). If no compatible layout exists, omit the page from page_layouts.

7. Project Folder

Project folder must exist before Strategist runs. If not, execute:

python3 scripts/project_manager.py init <project_name> --format <canvas_format>

Save outputs to projects/<project_name>_<format>_<YYYYMMDD>/design_spec.md.


8. Complete Design Spec and Prompt Next Steps

After writing design_spec.md and spec_lock.md, output the next-step prompt below. This is a handoff instruction, not part of design_spec.md. Pick the variant by whether Step 3 copied a template into <project_path>/templates/.

Template mode (template applied in Step 3)

✅ Design spec complete. Template ready.
Next step:
- Images include AI generation → Invoke Image_Generator
- Otherwise → Invoke Executor

Free design (default, no template)

✅ Design spec complete.
Next step:
- Images include AI generation → Invoke Image_Generator
- Otherwise → Invoke Executor (free design for every page)