Compare commits

...

79 Commits

Author SHA1 Message Date
caoqianming d24165a2fe style(web): 列表状态灯挪到文件夹行左侧,数据行 space-between 均匀分布(bump 0.38.8)
终态徽章 + 运行圆点放进文件夹行行首(无文件夹行回落数据行,patch 逻辑同规则
找 host);底部数据行剩纯数据均匀铺开,时间自然落行尾。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 16:58:56 +08:00
caoqianming 259dde502d fix(web): 列表 meta 行数字组靠左跟排——修 active 静默后的左侧缺口(bump 0.38.7)
active 徽章静默后,无 skill 行的 meta 左槽空置,条/tok 整组右挤留出一块
"缺了东西"的空白。数字组改靠左填槽,仅时间锚行尾;删无意义的 right-group。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 16:55:32 +08:00
caoqianming 2937b75143 style(web): status 徽章默认态静默——active 不挂徽章,终态行淡化(bump 0.38.6)
「进行中」徽章与运行圆点语义撞车,且列表主体都是 active,重复徽章是零信息
噪音。改为:active 不渲染徽章(列表 + chat-meta 同规则),completed/abandoned
保留徽章且整行淡化(st-* class,hover 恢复),脉冲圆点成为唯一动效信号。
删不再渲染的 .badge.active CSS。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 16:49:44 +08:00
caoqianming 7e6159af48 style(web): 运行态标识精简为纯脉冲圆点——文案收进 hover title(bump 0.38.5)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 16:43:58 +08:00
caoqianming 640bd0a1a3 feat(web): 后台 running task 自动挂 SSE——运行态标识刷新后也实时(bump 0.38.4)
loadTaskList 收尾 subscribeRunningRows:列表带出的 running/cancelling 行本地
未订阅的自动挂事件流(上限 4 条防同源连接占满),done/error 走现有收尾清标识 +
重拉列表,零轮询。ensureRunningTaskSubscribed 的 cancelling/workingDir 改由
调用方传 seed(后台 task 媒体 rel 解析要用各自 working_dir);后台订阅不再调
renderLiveRunIfVisible(避免重挂卡强制滚底误伤当前对话)。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 16:41:26 +08:00
caoqianming 0ad7d08242 feat(web): 任务列表加运行态标识——多 task 并发时可见哪些在跑(bump 0.38.3)
列表行 run_status(后端本就返回,前端一直没用)渲成状态徽章旁的标识:
running 绿脉冲点/cancelling 橙/error 红点(hover 出 run_error)。取值叠加本地
liveRuns;run 开始与点停止时就地 patch 行 DOM(不重拉列表保分页),run 结束
沿用收尾 loadTaskList() 重拉。⋯ 菜单"清空对话"的 running 判断同源修正。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 16:25:59 +08:00
caoqianming 941554f9d7 feat(ppt): zongyuan_red 逆向重建为真实中国建材总院模板 + 主动提示(bump 0.38.2)
按官方 总院模板.pptx(中国建筑材料科学研究总院)把手搓的 zongyuan_red
重建为真实品牌模板:PowerPoint COM 渲真页 + 解 pptx 抽实测色/字/资产。

- 打包 logo.png(八边形字标,EMF→PNG)/ cover_bg.jpg(总部大楼灰度)/
  ending_bg.jpg(材料马赛克);TIFF→压缩 JPG、EMF→透明 PNG
- 重写 5 页 SVG 忠实还原:封面(实景铺底+红块)/目录(红斜三角)/
  章节(八边形水印,原件缺按 DNA 合成)/内容(灰底红顶条卡片+底部红条)/
  尾页(材料创造美好世界+Thanks)
- 实测身份:主红 #D7000E、目录红 #D52C24、近黑 #181717、辅灰 #6F6F6F/#BCBDBD;
  微软雅黑+Arial+方正兰亭黑
- 改写 design_spec.md;补登记 layouts_index.json(此前 dir 在但未注册)
- 质检 --template-mode 5 页零 error;finalize 内嵌 8 图 + 全量渲图逐页确认

主动提示:strategist.md §e + SKILL.md 默认主题段各补一条 —— 指向
中国建材总院·CNBM 系汇报(含职称评审)时策略阶段主动把 zongyuan_red
整套模板作为候选点名给用户,点头再按明确路径套入;唯一鼓励主动提模板的
场景,其余仍等明确路径。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-03 15:18:06 +08:00
caoqianming dc721ba8a3 fix(web): 进度 dock 展开遮挡最新内容——task_progress 后补触底(bump 0.38.1)
#task-progress-dock 是 #chat-stream 上方的 flex 兄弟(flex-shrink:0),dock 一涨高就
从顶部挤掉 chat-stream 的可视高度,scrollTop 据置不变 → 原本贴底的最新内容被推到视口
折线以下看不见。直播态 task_progress 事件重渲 dock(=涨高)后早 return,跳过了末尾的
贴底兜底,故底部不自动回滚。修:在 task_progress 分支重渲 dock 后补一句
if (nearBottom) stream.scrollTop = stream.scrollHeight(与其余事件分支同款)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-03 13:59:34 +08:00
caoqianming 346930449a feat(ppt): 反纯文字页+图表落地硬门(7aa49195 二代陶瓷 deck 复盘,bump 0.38.0)
0.37 网格锁生效后复评仍存两盲区:两栏裸文字页 x4(指纹看不见)、
全本零数据图表;另有内容被页脚裁掉、CJK 文字叠压两硬缺陷。修五处:
- 指纹加 text-columns 原型(0 卡片+<=3 图标+<=2 图形基元+左对齐文本
  聚 >=2 列),裸文字页进单调门,4 页同指纹 error
- spec 指派图表落空检测:page_charts 指派了图表但该页 <3 图形基元
  且 <4 卡片 -> error;executor 硬规则"不许把指派图表降级为文字"
- CJK 叠压升级:两 run 均 >=70% CJK 且互叠 >=50% -> error
  (表意字宽 1.0em 估宽近精确,其余情形保持 warning)
- layout_grid 加可选 content_bottom,正文 baseline 越过 -> error;
  executor 加"写页前垂直空间预算"纪律
- 策略层数据图表下限:素材含 >=3 组可比数值 -> 全本至少 1-2 页
  真数据图表,零图表需在 spec 写理由

测试 +9(30 项)全过,全量 162 过;charts/decks 模板回归零新增噪音。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 13:34:51 +08:00
caoqianming d30f6089bb fix(web): 直播流式文字按轮次分段——修工具刷屏时文字被推出视口(bump 0.37.2)
一次 run 把整段(含几十轮 LLM)塞进一张 assistant 卡:文字全累顶部单块、
工具卡全追加其下,工具多时文字被越推越高滚出视口看不到。根因是直播态(单卡合并)
与历史态(每轮 LLM 一条独立消息、天然穿插)结构不一致。

方案 A(只动 chat.js live-run 路径,历史渲染不动):文字按轮次分段——
ensureTextSeg/closeTextSeg 维护当前打开的文字段,每个可见工具/选项卡(非隐形
task_progress)先关掉当前段(空占位段移除、有内容段定稿去光标+高亮),之后新文字
在卡片底部另起新段。流式文字始终在底部可见,且与历史结构一致,run 结束 reload 无跳变。
rAF 节流改闭包捕获 seg 防错渲;ctx.body/ctx.pending 单块模型换成 ctx.curSeg。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-03 13:22:15 +08:00
caoqianming 6f27b7cc5a fix(seedream): size 面积钳制——修 1920x1080 被 ARK 400 打回(bump 0.37.1)
模型自选 16:9 出图(1920x1080=2.07M px)触发 ARK 硬门
`image size must be at least 3686400 pixels`(=1920²,卡总面积非单边),
整次文生图 400 失败。

- tools/seedream.py: 新增 _normalize_size(),出图前把 size 钳进
  [min_pixels, max_pixels]:面积不足按 sqrt(min/area) 等比放大、
  取整到 8 的倍数并复核达标(1920x1080→2560x1440);超上限等比缩小;
  已合规原样透传(向后兼容)。归一化时返回串附 [note]、meta 记
  requested_size,记账按真实出图尺寸。
- config/media/doubao.yaml: seedream_5 加 min_pixels/max_pixels
  (旧 yaml 缺键=不设该侧,行为不变)。
- bump 0.37.0→0.37.1;PROGRESS 加一条。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-03 12:41:46 +08:00
caoqianming 0e02cff6c6 feat(ppt): 对齐网格锁+错位/单调质检(d1285247 陶瓷 deck 复盘,bump 0.37.0)
复盘 25 页陶瓷 deck 三类缺陷:跨页左基线漂移+并排块顶差 2-12px 的
"想对齐没对齐"、5 页同为图标卡网格的单调、标题语义不兑现(架构画成
横条列表)。修四层:
- spec_lock 新增 layout_grid 锁段(margin_x/content_top/footer_y/gutter),
  strategist 派生、executor 每页吸附、checker 强制
- executor-base §3 网格对齐纪律(同 top 同高等 gutter、打破网格 >=16px、
  同行文字 >=0.3em 禁贴字)
- svg_quality_checker 新增 check 14:兄弟卡片近失对齐 2-12px error
  (底对齐/中心对齐/chart-plot-area 内数据柱三类豁免,71 charts 回归
  误报清零)、layout_grid 偏离 2-15px error、gap 不等 warning、无锁
  项目跨页左缘聚类漂移 warning、版式指纹单调门(>=3 同指纹 warn、
  >=4 或过半 error;仅对 NN_ 编号 deck 页聚合)
- 策略纪律:同一版式原型整本 <=2 次 + 标题语义必须被图形兑现

顺手修 comparison_columns 模板胶囊 5px 错位。
新增 tests/test_svg_alignment_check.py 21 项;全量 153 过。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:16:42 +08:00
caoqianming a89c7386fd fix(web): 进度条自愈——回放层强制单调完成(d1285247 复盘,bump 0.36.2)
task_progress 回放非渲染 bug:模型跳步推进时漏给上一步补 completed,
导致"下面绿勾、上面红圈"。progress.js 加 enforceMonotonicProgress:
某步 completed 则其之前所有步自动 completed,set_plan/update_step 出口
各过一遍,漏发自愈。前端单测 +3(含复刻 d1285247 跳步序列→6/6)。
诊断脚本 scripts/diag_progress_d1285247.py。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-03 11:16:34 +08:00
caoqianming fcc158dff6 fix(ppt): 门体系二轮硬化——逃生口收紧+导出自动质检+svg_final 嵌图修复(bump 0.36.1)
0.36.0 重跑复盘:门都触发了,但弱模型 8 秒内连按 --allow-iconless +
--allow-unreviewed 绕过,质检/渲图验收仍 0 调用,4/25 页错位漏出。修五处:

- A 验收门分层:"从没渲过/渲后又改/finalize 前渲的"= 硬问题,任何 CLI
  flag 不豁免;--allow-unreviewed 只豁免"渲过但没标 pass";运维兜底走
  ZCBOT_PPT_FORCE_EXPORT=1 环境变量(不进 --help/SKILL)
- B 拔 -s final 雷:图标门永远对 svg_output 源检测(消除 svg_final 展开
  后误报"零图标"),wrapper docstring 老示例删除
- C 导出自动质检门:svg_to_pptx 导出前内嵌复跑 quality checker 逐页硬
  错误,error 拒绝导出、无豁免参数
- E 几何质检加"文字骑卡片边缘"检测(warn 带坐标,P12/P14/P18 类命中)
- F 修 svg_final 嵌图失效:copytree 后 ../images/ 解析必落空,所有 deck
  的 svg_final 一直嵌不进外链图(验收 PNG 图片为空);resolve 加 rebase
  回 svg_output 兜底

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 08:58:49 +08:00
caoqianming 3c712031d5 feat(ppt): 渲图验收闭环+导出验收硬门+几何质检(139a59c5 错位复盘,bump 0.36.0)
复盘 25 页 deck 错位交付:阶段六全量渲图验收被整个跳过(svg_preview 0 调用,
进度步骤只跑了 echo),图标 regex 盲插压字、大字压说明、目录溢出页底全部漏出。
文档要求过但无机制强制,三层补齐:

- A 机制:svg_preview 渲图登记 .build/acceptance.json(源 sha1+verdict);
  新增 accept_pages.py 标 pass/fail(校验渲过+源未改);svg_to_pptx 导出
  边界加验收硬门(每页 pass 且 sha1 未变,--allow-unreviewed 逃生)
- B 提前拦截:svg_quality_checker 新增几何检测(估宽包围盒):图标压字/
  基线出画布=ERROR,文字重叠=WARN 带坐标(密排设计误伤权衡,判断交渲图
  验收);tspan 按视觉行归组续排,71 charts 模板 0 error 误报
- C 文档:SKILL.md 管线改"后处理→渲图验收→导出",反模式加"没看 PNG 就
  --pass-all""为消警告批量盲插元素";SKILL_LIST 同步

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 13:37:59 +08:00
caoqianming d79c28de06 fix(ppt): 禁自搓 SVG→PPTX 导出器硬约束(966041e5 复盘,bump 0.35.1)
复盘 陶瓷资源节点建设方案 (3).pptx:25 页全是整页 PNG 贴图、零原生
文本/形状。根因是模型整条绕开官方管线(svg_quality_checker/finalize_svg/
svg_to_pptx/svg_preview/total_md_split 调用次数全 0),自搓 cairosvg
export_pptx.py 逐页光栅化贴图,连带图标空方框、外链配图丢失、文字溢出。

上一条(0.34.7)硬化的是官方工具内部的门,只在模型用官方工具时生效;
本次证明模型可完全另起平行管线,内部门无从触发。改动仅在文档层:
- SKILL.md 阶段五:加「导出唯一入口=官方 svg_to_pptx.py,默认原生可编辑、
  纯 Python 无需外部渲染器,渲染器没装不是自搓借口」
- SKILL.md 反模式:加「绕开官方管线自搓导出器 → 不可编辑贴图、价值作废」

不改线上跑法/官方脚本行为。残留风险(平台层自动检测整页贴图)按用户
选择暂缓,已记入 PROGRESS。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-02 09:25:14 +08:00
caoqianming e46eb01766 feat(shortcuts): 加快捷指令(触发词→完整指令,入口层确定性展开)(bump 0.35.0)
预定义"简报 → 给我输出一份昨日的 AI 新闻简报",任意入口整条打"简报"就展开执行。

关键设计:快捷指令 ≠ memory。memory 是注上下文给模型概率召回的软上下文;快捷词是
入口层、模型跑之前的确定性替换(命中即换、零歧义)。性能上 shortcuts.md 内容永不注
上下文,存再多条平时也是 0 token;触发时进上下文的就是那条完整指令本身。

- core/shortcuts.py(新):shortcuts.md(| 触发词 | 指令 | 两列表)解析 + expand()
  整条 strip()+casefold() 精确匹配展开(与「新话题」魔法命令同风格,不部分匹配)
- web/app.py 两处共用同一 expand:渠道核心 _run_channel_conversation(微信/企业微信)
  + 网页 post_message,起 run 前展开,任意入口行为一致
- core/memory.py memory_block:加一行契约让模型可维护 shortcuts.md;内容不注上下文
- tests/test_shortcuts.py(新):解析 + 展开全覆盖
- DESIGN §3.7 加"快捷指令 ≠ memory"取舍段 + 文件树;PROGRESS 加条目

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 14:58:55 +08:00
caoqianming c2d24b20b4 fix(ppt): 导出图标门升硬 + 修 svg_to_pptx CLI 退出码不传播 + 验收改全量(bump 0.34.7)
诊断 ppt生成2(966041e5)真实产出的两个缺陷——23 页零图标、多处错位——
根因不是缺 gate 而是 gate 被打穿:

- svg_to_pptx.py 只 main() 不 sys.exit(main()),main() 里所有 return 1
  (图标门/无 SVG/坏路径)全被吞成退出 0(最致命)
- 导出侧图标检查按设计只软 WARN、照常产出
- 模型质检用 `| head` 截断,吞非零退出码 + 截掉打在最后的零图标 [ERROR]
- SKILL.md 验收本就只要求抽查 3 页,错位藏在没看的页里;差评也未阻断

改动:
- svg_to_pptx.py: sys.exit(main()) 传播退出码
- pptx_cli.py: 导出图标门从软 WARN 升为硬门(锁图标却全 deck 零
  <use data-icon> → [ERROR] 退非零、不产出 pptx),加逃生口 --allow-iconless
- SKILL.md: 阶段六验收改「默认渲整本 + 逐页过目 + 差评即阻断返工」,
  阶段四/五/反模式补「别用 | head 截断」「别只看几页」「差评必返工」

合成测试三例(默认拒 / --allow-iconless 放行 / 有图标正常)全过。
仅改 skill 侧,不改动线上跑法;导出门只兜「锁了图标却零引用」,正常 deck 不受影响。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 14:16:49 +08:00
caoqianming 641c7d58aa fix(tools): look_at_image/seedream 接受容器 /workspace 绝对路径(bump 0.34.6)
docker backend 下系统提示告知主模型一切在 /workspace 下,模型自然产出
/workspace/<wd>/x 绝对路径,但 image_ref.resolve_in_root 不翻译该前缀,
报「图片找不到或越界」。加容器根前缀翻译(与 send_email 的 _resolve_user_file
一致),按字符串前缀判断而非 is_absolute()(Windows 上 /workspace 缺盘符不算
绝对);越界仍靠 relative_to(root) 兜住。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 14:07:44 +08:00
caoqianming eb9ffd654f feat(admin): 各用户用量表加「最近使用」列(bump 0.34.3)
后端 _user_usage_page 加全量(不随 range 筛选)相关子查询
max(created_at) → last_used_at;前端 renderUserUsage 加列,
fmtTimeAgo 显示 + 全时间戳 title,无用量显示「—」。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 13:28:34 +08:00
caoqianming d8f71aa7b2 feat(ppt): 页数改为用户必须显式拍板的 gate(bump 0.34.2)
页数原先只给「常 8–15 页」区间又被打包进 a–h 批量确认,用户一句
笼统「OK」就整批过、模型自取区间中位数(~12)。改(纯文档):
- SKILL.md b 项 → 推一个具体数字 + 标为「独立拍板项」
- SKILL.md 新增「🔒 页数 gate」:没给/没显式认可具体张数必须单独
  追问「就定 N 页?」拿到明确整数才写逐页大纲;唯一例外是用户明说
  「页数你随意」时按推荐数走、仍在预览写出供否掉
- strategist.md §b 同步补 Non-defaultable gate 硬约束 + 例外

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 13:01:51 +08:00
caoqianming 4b1dce6df9 fix(web): 清空对话时同步清空右侧导航条(bump 0.34.1)
clearMessages 成功分支只 renderMessages([]),漏了重置 outline;
切 task 路径有 state.outline=[]; renderOutlineRail(),清空路径补齐。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 12:13:51 +08:00
caoqianming 5bde2445a0 refactor(ppt): 工作目录收进隐藏 .build/ + 反卡片映射 + svg_preview 兜底/gate(bump 0.34.0)
累积一批(承接 ppt生成2 验证 + 用户"缺图形/卡片阵太多/文件夹过多"反馈):

- 工作目录重构:<project_dir> 根原本把"持久源 / 交付物 / 可再生构建产物"混摊。
  新增 project_utils.build_dir/svg_final_dir/preview_dir/backup_dir 单一事实源,
  把 svg_final→.build/svg_final、preview→.build/preview、backup→.build/backup/latest
  (只留最新,不再堆时间戳)。.build 是 dotfile → /v1/files 自动隐藏 → 用户可见面
  收敛到 源(sources/images/svg_output/notes/两个 spec)+ 交付物(exports)。改动:
  finalize_svg / svg_preview(_collect)/ pptx_discovery('final'→.build/svg_final)/
  pptx_cli(backup 路径 + rmtree 清旧)+ SKILL 工作目录约定/命令。端到端实测:根目录
  只剩 exports/+svg_output/,.build/ 三子目录就位,导出/预览/backup 全正常。

- 反卡片映射(治"大段大段卡片阵"):executor-base §page_rhythm 的 dense 行去掉
  "card grid 是 baseline"的背书;加一段硬映射「先看内容关系再选图形」(系统→
  hub_spoke/分层、流程→flow、层级→树/金字塔、循环→环、互依→mind_map、对比→象限、
  ≥3数据→图表),卡片阵封顶 ~1/3 页、连画两页网格下一关系页必须上示意图,指回 page_charts。

- svg_preview 加 cairosvg 兜底:find_browser 改返回 None 不抛错;无 chromium 时回退
  cairosvg,渲前用 embed_icons 预展开 <use data-icon> 成真 path(避 INVALID_MATRIX);
  修 --screenshot 相对路径静默失败(改绝对路径 + 暴露 chromium stderr)。

- 扁平 gate 计入 circle/polyline:svg_quality_checker 图形图元加 <circle>(node/venn/
  timeline 是真图,修 21-circle roadmap 误判);文字密集 deck ≥60% 页无图形 → ERROR。

架构结论(svg 目录):svg_output(可编辑源)与 svg_final(自包含编译产物)是两态、不能
合并成一个文件,但只暴露一个——现 svg_output 可见、svg_final 进 .build。终态(下一议题)
干掉持久化 svg_final、finalize 内存化 + web 按需预览,牵涉 web 层,本次未做。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-07-01 11:12:57 +08:00
caoqianming 13835a315a feat(ppt): 加商务红品牌预设 + 配图默认主动提议(bump 0.33.5)
用户两个需求:(1) 加一款红色主题;(2) 用户没给图时在需要处主动配图。

- 商务红品牌预设:新增 templates/brands/business-red/design_spec.md(同 anthropic
  格式:#C00000 全色表 + primary-deep/gold/info/positive/alert/surface/border/muted
  派生色 + 宋体标题/黑体正文字体栈(栈尾收预装字体)+ 实心图标偏好 + 政企口吻;无
  logo,注明用文字 wordmark / 可后补)+ brands_index.json 加条目。红色承载在 brand
  而非 visual-style(后者不带色)。同时把商务红设为 strategist §e 默认配色候选:中文
  政企/集团/科研商务汇报默认列入 ≥3 候选(红金 #BF9B5F / 红蓝 #2B4C7E 二选一点缀,
  纯红只压标题/关键数据)。SKILL §默认主题 + 八条对齐 h 行同步指向。

- 配图默认主动提议:strategist §h + SKILL h 行改——用户没给图时不再默认整本 A
  (no images);封面/分节/概念/breathing/氛围页主动把 ai 配图作为候选提给用户(数据/
  列表/流程页仍走图表→§VII,不配装饰图)。仍全程 gated:用户在 h 确认 + imagegen
  自带成本门(提议免费,确认才花钱)。

附:scripts/config.py 的 INDUSTRY_COLORS 未移植(ppt-master 残留引用),strategist
文档表是实际依据,已直接在表里加商务红行。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 15:57:52 +08:00
caoqianming 4a6182a76a fix(ppt): 修生成 PPT 缺图形(扁平 deck 质检 gate + 策略层视觉下限)(bump 0.33.4)
延续缺图标排查,统计最近 ppt生成 任务 24 页 SVG 的元素构成:<path>=0、
<image>=0,整本是 <text> 摞 <rect>(文字方块),零示意图/图表/配图。根因同
图标——71 个 charts/ 模板没用、content→版式映射形同虚设,且策略层把"Not every
page needs a chart"当跳过口子(spec_lock 实际 page_layouts: free design、无
page_charts 段),输出层又无 gate 拦扁平 deck。两层修(用户选定):

- A' 输出 gate(svg_quality_checker):统计每页图形图元 <path>/<polyline>/
  <polygon>/<image>(rect/line 是版面脚手架不算);≥6 页且文字密集(avg <text>
  ≥10/页)却全 deck 0 图元 → deck 级 error 退非零(逼回执行重写);多数页无图元
  → INFO;<6 页豁免(不误伤极简/teaser)。实测:8 页文字方块→exit 1;任一页带
  path→放行;4 页→豁免。

- B' 策略层视觉下限(strategist.md GATE):把 §633「Template Match」从纯建议升为
  硬下限——内容 deck(≥6 页)每个能结构化的内容页必须分配视觉处理(page_charts
  模板 / page_layouts 结构模板 / §VII 自绘示意图),spec_lock 不许 page_charts +
  page_layouts 同时空着;给出 content→图形映射速查;明示下游 A' 会硬卡。同步改
  SKILL §大纲映射纪律 + §阶段四质检清单 + spec_lock_reference page_charts 段。

诚实边界:prompt+gate 抬下限(逼别交全文字 deck),执行模型设计功力是上限;gate
守"零图形"底线而非"每页必图表",避免误伤极简风。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 14:37:28 +08:00
caoqianming 5d23ee682b fix(ppt): 修生成 PPT 缺图标(图标管线四层断点)+ 沙箱 SVG 预览渲染(bump 0.33.3)
查真实用户两个「ppt生成」任务的 DB 执行轨迹:24 页 SVG 共 0 个 <use data-icon>。
根因是图标管线四环节无一强制图标落地——策略层(有时)锁图标,执行层不放、
质检层不拦、工具层还断着。四层一起修:

- B 工具断点:references/SKILL 23 处路径仍指向已不存在的 skills/ppt-master/
  (zcbot 是 skills/ppt/)→ 模型 `ls .../icons/<lib>/|grep` 验名得空集 → 放弃图标;
  且 strategist 强制用的 icon_sync.py 在 zcbot 根本没有(GATE 空转,正是某任务连
  图标都没锁的原因)。修:全量改路径(保留上游署名)+ 新建 icon_sync.py(复用
  embed_icons 解析,验名+拷进 project/icons,缺名非零退出)。
- A 质检兜底(硬门):svg_quality_checker 加图标校验——锁了 icons.library + 非空
  inventory 但全 deck 0 图标 → deck 级 error 退非零(逼回执行重写);单页 0 图标 →
  warning(封面/分节/breathing/尾页豁免)。
- C 执行强制:executor-base §4 + SKILL 执行纪律改为"内容页必须放 1–3 个 inventory
  图标"(自由设计无模板可继承图标,只能逐页手写)。
- D 导出兜底(纵深):svg_to_pptx 导出前预扫,锁了 inventory 却 0 图标 → stderr 大声
  [WARN](非致命,防跳过质检直接导出)。核实 native 转换器本就自己从图标库展开
  <use data-icon>,故原设想的"finalize 硬前置"前提不成立,D 改成与 A 同源的导出层警告。

同版附带修 svg_preview.py 在沙箱里渲不出 SVG(报"未找到 Chrome / Edge"):移植自
ppt-master 的 find_browser() 只认 Windows chrome/msedge,不认镜像自带 /usr/bin/chromium
(给 mermaid 装的)→ 视觉验收这关在容器里全程失效。对齐 rendering/pdf.py 发现逻辑
(认 chromium/chromium-browser/google-chrome + $CHROMIUM 覆盖);render() 补容器必需的
--disable-dev-shm-usage + 临时 --user-data-dir;并修一个静默已久的 bug——--screenshot
传相对路径 chromium 写不出文件(原代码吞 stderr,看着和"没浏览器"一样),改传绝对路径
并暴露 chromium stderr。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 13:59:00 +08:00
caoqianming 001f9af96f fix(vision): look_at_image 超时透明重试 + 超时 60→120s(bump 0.33.2)
Seed 2.0 Lite 非流式,长 OCR 首字节可能逼近 60s read timeout → 偶发超时;
且返 [Error] 会触发主模型重发整个 tool call(图 base64 重传、输入 token 再付一次)。

- core/ark_client: 新增 ArkTimeoutError(ArkError) 子类,仅超时/网络抖动抛它;
  HTTP 4xx/5xx 业务错误仍抛普通 ArkError 不重试。子类仍是 ArkError,seedream 等
  现有 except ArkError 不受影响。
- tools/look_at_image: 对 ArkTimeoutError 退避重试(timeout_retries 默认 1 次,
  2^n s),tool 内消化掉不抛给主模型,避免重传图烧 token。
- config/media/doubao.yaml: vision request_timeout_s 60→120,新增 timeout_retries。

smoke_look_at_image 通过(OCR 命中 + 记账正确)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 09:02:40 +08:00
caoqianming ff276eb9b3 fix(web): SVG 预览强制 image/svg+xml(前端 blob mime + 后端 download)(bump 0.33.1)
SVG 在 <img> 里必须 Content-Type=image/svg+xml 才渲染。前端 preview.js 的
_showImage / mini 图片分支据扩展名强制 blob mime;后端 download 接口对 .svg
显式回 image/svg+xml(部分部署环境 mimetypes 未注册 svg → FileResponse 会猜成
octet-stream → 不显示)。双保险。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-30 08:28:11 +08:00
caoqianming e3a432dcdd feat(ppt): skill 重构为 SVG-first(移植 ppt-master,弃 python-pptx 版式件)(bump 0.33.0)
旧 python-pptx 固定组合版式件是版面单调/AI 味的架构天花板。改为 SVG-first:
AI 逐页手写 SVG 设计稿 → 纯 Python 转换器逐元素译成原生可编辑 DrawingML。

- 搬引擎:svg_to_pptx/ 转换器 + finalize_svg/svg_finalize + svg_quality_checker + total_md_split + update_spec(依赖闭包干净,只需 python-pptx)
- 搬知识:references(shared-standards/executor-base/strategist/image-layout-*/canvas-formats)+ 5 叙事骨架 + 19 视觉风格
- 搬模板:templates(layouts/decks/brands/charts + 图标库 1.1w+ + spec 骨架)
- 换 GUI:浏览器 Confirm UI → 聊天 BLOCKING 八条确认;live preview → svg_preview.py(无头 Chrome 渲 SVG→PNG);配图走 zcbot imagegen skill
- 默认主题改自由设计(商务红降为候选之一)
- 修 Windows GBK 控制台 UnicodeEncodeError:6 个入口脚本加 sys.stdout.reconfigure(utf-8) shim
- 端到端验证通过:4 页材料领域 deck,质检 0 error → finalize 嵌图标 → 导出原生 pptx → 渲图肉眼验收(swiss-minimal 设计级,非 AI 味)

移植自 github.com/hugohe3/ppt-master (MIT),适配 zcbot task_dir/聊天确认/imagegen 工作流。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 16:38:58 +08:00
caoqianming d4aa5ccbec docs(prompt): system prompt 加通用 context 纪律,堵大块输出滚雪球(bump 0.32.5)
反复 dump 全文 abstract 烧 2.5M token 不是 brief 专属——任何 skill 让弱模型
处理一批长文本都可能踩。在 system prompt 单一事实源 general_v1.md「工作原则」
段、紧挨「少来回」加一条全局铁律:大段 run_python/shell 输出会进对话历史每轮
重发,中间数据落文件、只 read 用得上的片段、别整批重复打印,否则烧 token 还
可能撑爆窗口/拖到超时被掐断。

与既有规则互补:行7(源码落 .py)管代码、行42(少来回)管轮数、本条管"大块
数据输出"。brief skill 0.32.3 的场景化版本保留做细化。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:56:44 +08:00
caoqianming b5d75d2a7b feat(scheduler): 定时任务默认单次超时 0→1800s(bump 0.32.4)
超时此前默认 0(不限),配合"超时被吞成 ok"的旧 bug,跑飞的 job 能无限拖。
改默认有限值 1800s(30min):新建 job 不指定 timeout_seconds 时给 1800,
显式 0 仍保留为"不限"逃生口。

- 单一事实源 core/scheduler.DEFAULT_TIMEOUT_SECONDS=1800;create_job 与
  tools/schedule.py(agent 建 job 的工具)默认都引它,JSON schema 描述同步。
- create_job 里 int(timeout_seconds or 0) 保留显式 0=不限语义。
- 存量:线上 job e621c8a6「每日水泥科研简报」timeout 600→1800(直接 SQL,
  未动其它 job)。
- RUN 故障兜底行同步默认值。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:52:09 +08:00
caoqianming 700176a0c6 docs(brief): 加 context 纪律,堵反复 dump 全文 abstract 烧 token(bump 0.32.3)
承接定时任务超时复盘:同一 job 的 agent 把 38 篇全文英文 abstract 用
run_python/print 反复灌进上下文(≥3 次),工具输出每轮重发 → 48 次 LLM
调用累计输入 2.5M tokens(输出仅 28K),既慢又贵还顶满 600s 超时。根因
brief skill 虽要求证据落 evidence.md 文件,却没明令"别反复 print 进上下文"。

skills/brief/SKILL.md 三处加指示文:
- 阶段二「context 纪律」:落文件、按需 read、别整批重打
- 阶段三:一次成稿别重复 dump + 论文多时按期刊分批 write
- 反模式加一条:反复 print 全文 abstract 让 context 滚雪球

纯指示文,frontmatter/description 不变 → SKILL_LIST 无需更新。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:38:32 +08:00
caoqianming 1646205364 fix(scheduler): 定时任务超时被掐断时记 error 而非误吞成 ok(bump 0.32.2)
实测 bug:isolated 定时 job 跑满 timeout_seconds 被协作式 cancel 后,
_run_agent_bg 对 ok/cancelled 都把 run_status 收回 idle(DB 不可区分),
而 _execute_scheduled_job 收尾只判 run_status=="error",于是超时中断被落成
last_status="ok" —— 掩盖"跑到一半没写 sections/没推送",且不计连续失败、
不触发兜底。复盘 job e621c8a6「每日水泥科研简报」:timeout=600s,task
创建→last_run_at 正好 600.0s,agent 停在"按期刊打印 38 篇摘要"(还在取数)。

修:超时分支置 timed_out 标志,run 收尾后若 timed_out → record_result(
status="error", 半成品不投递 notify)并直接返回。复用既有 error 语义(计入
consecutive_failures、到阈值自动停用、前端 crons 显示「上次失败」)。不动
_run_agent_bg 的 idle-on-cancel 共享语义(HTTP cancel/drain 也依赖)。

配套:PROGRESS/RUN 故障兜底各加一条;诊断脚本 scripts/diag_sched_e621.py
(dump 输出 scripts/_*.txt 入 gitignore)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 14:35:40 +08:00
caoqianming 89062d99b3 docs(design): 新增 §8.8 channel 长会话上下文治理(Phase 1 / Phase 2-3 design)(bump 0.32.1)
记 channel 常驻会话上下文软重置的设计:根因、业界对照(OpenClaw/Hermes/
Claude Code)、「边界而非删除」心智、Phase 1 已落地(context_base_idx 软重置
+ gap 自动分段 + 新话题命令 + 否决的替代方案)、Phase 2(阈值结构化摘要,
对齐 Hermes 阶段③)/ Phase 3(sqlite-vec/FTS5 持久检索)design。
回链修 §8.7、§8.2 两处引用。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 10:47:02 +08:00
caoqianming b27cc9cd5b feat: channel 长会话上下文软重置(gap 自动分段 + 新话题命令)(bump 0.32.0)
微信/企业微信常驻会话不再无限膨胀。tasks 加 context_base_idx,
Session.load 只把 idx>=base 的消息喂模型,base 之前历史全留 DB
(网页端照旧翻完整记录,一条不删)。

- 自动 gap 分段:入站距上次消息超 channel.session_gap_hours(默 6h)
  → 软重置,base=最后一条 user 消息 idx(保留上一轮做续聊锚点)
- 手动新话题:发「新话题/新会话//new/清空上下文」→ 硬重置 base=总数
- clear_messages 全删后归零 base;_db_idx 取真实总数避免 append 撞 idx

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 10:05:07 +08:00
caoqianming e49ff641f9 feat(web): 消息框支持拖拽文件 + 修多次粘贴互相顶掉(bump 0.31.3)
附件 chip 拆出独立托盘 #chat-attach,与状态文字解耦:append+按 rel 去重,
上传进度只写 #chat-hint,不再互相覆盖。整个 #chat-form 加 dragenter/over/
leave/drop(计数防闪烁,只认文件拖拽,微信镜像只读不接收),复用 uploadFiles。
takePastedRels/删除/预览改查托盘;切 task 清残留未发送 chip。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 14:38:58 +08:00
caoqianming d235cb7564 fix(web): 消息目录圆点错位再修(点击竞态 + 触底兜底)(bump 0.31.2)
- 点圆点不变红/点#1跳到#2:scrollIntoView 平滑滚动途中的 scroll 事件
  抢走显式点选 → 加 _outlineJumpLock,跳转期间不重算,700ms 兜底解锁
- 点最后一个/滚到底倒数第二个变红:末项永远顶不到顶线(容器先到底)
  → updateActiveOutlineDot 加触底分支,判最后一个已加载轮为当前

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 14:27:50 +08:00
caoqianming 1352f092a3 feat(web): admin 近7天用量表加合计行(bump 0.31.1)
renderByDay 在 by_day_7d 表底加 tfoot 合计行,汇总 7 天
cost_cny/tokens_in/tokens_out;无数据时不渲染。后端无改动。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 14:21:40 +08:00
caoqianming b4808b0370 feat: per-account 模型访问控制(档位制,复用 plan 列)(bump 0.31.0)
- core/model_access.py(新):档位制访问控制。users.plan 存档位名,
  「档位→模型集合」配在 config/agent.yaml model_tiers;plan 空/未知→default 档,
  role=admin 全开。无需 migration(plan 列 0001 起就有,之前休眠)。
- 两档:default(deepseek+local+seedream+seedance)、pro(+doubao+glm)。
- web/app.py:三个 list 端点按档过滤(用户只看到本档模型);三个 resolve 加
  user_id 门控 —— 显式选档外模型 403;老 task 下次发消息模型已不在档位内→
  持久落回 deepseek_v4.flash;定时任务执行 grandfather 不门控。
- web/admin.py:GET /v1/admin/tiers + PATCH /v1/admin/users/{uid}/plan;
  用户行补 plan 字段。
- web/static/js/admin.js:各用户用量表加「档位」列(内联下拉)+ 档位图例 + apiSend。
- DESIGN.md plan 列语义 / RUN.md model_tiers 配置说明。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 14:11:22 +08:00
caoqianming 8263382fd1 feat: 新增豆包 Seed 2.1(turbo/pro/evolving)+ GLM 5.2 文本模型档案(bump 0.30.0)
- config/models/doubao.yaml(新建):Seed 2.1 turbo/pro + 自进化 evolving,
  走 Ark OpenAI 兼容端点(openai/ 前缀 + ARK_API_KEY,同 local.yaml 范式)
- config/models/glm.yaml:加 pro52(GLM 5.2,zai/glm-5.2,1M 上下文),与 glm.pro(5.1)并存
- thinking_mode 均 false(深度思考走 body 协议,非 reasoning_effort 等级,留 TODO)
- 单价按火山/智谱 2026-06 发布价;evolving 单价未公布暂按 pro 估值兜底
- RUN.md 更新 ARK_API_KEY 说明(文本+图像+视频三处共用)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 13:05:07 +08:00
caoqianming d633949a66 feat(web): 定时任务执行历史列表(右栏 Tab + 分页)(bump 0.29.0)
isolated 模式每次触发新建 task,旧的带 scheduled_job_id 被普通列表过滤、
UI 够不到,原来只有「打开它跑的任务」单按钮指向 last_task_id(最近一次)。

- 后端新增 GET /v1/schedules/{job_id}/tasks?page=&page_size=:按 scheduled_job_id
  归属 + user_id 隔离,created_at desc 分页,复用 _task_dict,标准分页壳返回。
- 前端定时弹框右栏改 Tab(详情 / 执行记录),动作按钮提到顶部 head;
  执行记录是分页列表,点某条打开那次对话。await 后重查 #cr-hist 防切换串显。
- 决策(与用户对齐):历史全部保留不剪枝;布局选 Tab 而非三栏。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:53:56 +08:00
caoqianming e7a86fb00c fix(web): 修复卡片⚙点击无反应(export openWechatModal)+ 卡片圆角调小
- openWechatModal 原是 wechat.js 私有函数,chat.js 访问不到 → 点击⚙无反应。
  export 出来并在 chat.js import,事件绑定直接调用(去掉 typeof 兜底)。
- 卡片圆角 8px → 4px(用户嫌太圆);cc-icon/cc-action 圆角同步调小。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:26:44 +08:00
caoqianming 12a2289de2 style(web): 卡片布局改左大右小,⚙ 固定宽度
卡片左侧占主空间(点开对话),右侧「⚙」固定宽度 28px(点开弹框),
点击更易触发。未绑定/已绑定无对话卡片也加右侧 ⚙ 管理。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 12:17:32 +08:00
caoqianming 013cbc28b5 fix(web): 卡片 ⚙ 按钮打开弹框管理
已绑定且有对话的卡片:⚙ 按钮打开弹框管理,拦截点击不触发 selectTask。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:28:14 +08:00
caoqianming a6d00b24ff feat(web): 渠道卡片收拢绑定管理 + 删 rail 按钮 + bump 0.28.1
把渠道绑定/对话/管理全部收进「新建任务」下方的卡片,删掉左下角
rail「微信」按钮(精简页面)。后端 /v1/channel_tasks 返回
{ wechat: { bound, task }, wecom: { bound, task } },前端渲染三种卡片:
未绑定(点绑定)/已绑定无对话(占位)/已绑定有对话(点进+⚙管理)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 11:19:45 +08:00
caoqianming e66fdd0ffc feat: 定时任务对话归属 + push 统一记录到渠道对话(bump 0.28.0)
问题1:定时任务产生的 task(isolated 每次新建)混进普通对话列表。
- tasks 加 scheduled_job_id(nullable FK→scheduled_jobs,migration 0017 + backfill
  persistent/isolated);列表 WHERE scheduled_job_id IS NULL 排除(+working_dir LIKE 兜底)
- ensure_local_task_row 加参数,_execute_scheduled_job 建任务时填
- mode 语义澄清:只管对话是否延续,文件夹两种模式都按 job 复用

问题2:任何 push(定时 deliver_notify / agent wechat_push 工具)推到微信渠道,
web 端渠道对话看不到、没法基于推送追问。
- 记录下沉到 send_to_user(两调用方统一入口):投递成功后对每个成功渠道
  ensure_channel_chat_task(不存在自动建,与入站对话共用)+ 写 assistant 消息
  (摘要+文件下载链接+../rel read 路径)
- Unified 进 agent 上下文(基于推送追问);source_task_id 去重(chat task 内调
  wechat_push 时不重复插摘要);不塞正文,agent 按需 read 产物文件
- _run_channel_conversation 复用 ensure_channel_chat_task,消除建 task 重复逻辑

messages.kind 列(migration 0018):push 记录标 kind="push"(独立列不进 payload),
extract_last_assistant_text 加 WHERE kind IS NULL 跳过,避免 wecom 入站取回复
误取 push 摘要当回复。

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-26 10:51:06 +08:00
caoqianming 133e350428 style(web): 渠道卡片改并排省纵向空间 + bump 0.27.4
接 0.27.3:两张渠道镜像对话卡片(微信/企业微信)从竖排改并排
(#channel-cards flex row,各 flex:1);窄栏内图标左、名称 + 条数·时间
堆两行(新增 .cc-body 列容器)。绑定弹框(左下角「微信」rail 按钮)保留
不动 —— 它是绑定/解绑/测试推送唯一入口,与卡片职责互补不重复。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 12:39:25 +08:00
caoqianming 7dfdf4c73b feat(web): 微信/企业微信对话改成左栏固定卡片 + 企业微信也只读 + bump 0.27.3
把渠道镜像对话(每用户每渠道唯一的常驻只读对话)从「任务列表置顶行 +
绿徽章 + 绿边」改成「新建任务下方两张固定卡片」,与可滚动任务列表分离、
常驻可见;顺带补企业微信对话的 web 端只读锁。

- 后端 /v1/tasks 用 coalesce(channel,'web').notin_(CHANNEL_MIRROR_KINDS)
  排除渠道任务并删掉 case() 强制置顶;新增 GET /v1/channel_tasks 返回
  {wechat, wecom} 摘要(复用 _task_dict,无则 null)
- 前端加 #channel-cards 卡片块(:empty 自动隐藏)+ loadChannelCards/
  syncChannelCardActive;移除列表行已失效的绿徽章逻辑
- applyChannelComposerLock / sendMessage 守卫从硬编码 channel==='wechat'
  改读 CHANNEL_BADGE,微信 + 企业微信都 readonly,提示文案按渠道动态

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 12:32:12 +08:00
caoqianming 474597cfc6 feat(wecom): 企业微信入站对话支持图片/文件附件(media/get 下载 + 复用渠道无关核心)+ bump 0.27.2
接续 0.27.0 企业微信入站(此前只收文本)。

- wecom.download_media(media_id):走 media/get,成功回二进制流 + Content-Disposition
  文件名,出错回 JSON errcode(40014/42001 重取 token);_filename_from_disposition 解
  filename / filename* 两种形式。
- 回调按 MsgType 分支:image/file 下载后构造 InboundAttachment(kind/file_name/data,与
  个人微信同结构)→ 喂同一 _run_channel_conversation,复用其落盘 + 拼 [用户上传的...] 行
  (图片 agent 自调 look_at_image,文件走 Read)。纯图片/文件消息无文本时据附件行生成 text。
- 语音/视频/位置/链接/事件暂回 success 不处理;附件下载失败静默跳过(打日志)。
- dev.html「企业微信(仅推送)」文案纠正为「推送 + 对话」。

文件:core/wechat/wecom.py、web/app.py、web/static/dev.html。_filename_from_disposition
+ import 自测过。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 12:17:44 +08:00
caoqianming d1aa2b12e2 fix(wecom): wechat_push 支持按渠道定向投递,修「点名企微仍推到个微」+ bump 0.27.1
用户说"推送给我的企业微信",消息却同时进了个人微信。根因:send_to_user
是无差别广播(for ch in active_channels() 逐个推),且 wechat_push 工具
没有指定渠道的参数 —— 部署同开 clawbot+wecom 时一条推送两边都到。

- send_to_user 加 channel=None:None 保持广播(定时任务/不点名沿用,向后
  兼容);指定 wecom/clawbot 时只投那一条,该渠道未开返回单条 no_binding,
  不静默回退到别的渠道。
- WechatPushTool 加可选 channel(enum wecom/clawbot)+ 描述教 agent
  「用户点名某微信就传对应 channel」,execute 做渠道白名单校验。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 12:15:36 +08:00
caoqianming d16297e556 feat(wecom): 企业微信支持入站对话(回调 webhook + AES 解密 + 复用渠道无关对话核心)+ bump 0.27.0
入站方式与 ClawBot 本质不同:ClawBot 走长轮询(getupdates + 常驻 run_inbound_manager),
企业微信走回调 webhook(企微服务器主动 POST 加密 XML)→ 无需后台轮询 task,只加 HTTP 端点。
agent 跑 >5s 超被动同步窗口 → 回复走 message/send 主动推回(复用 push_wecom),被动回 success 防重试。

- 抽 _run_wechat_message 为模块级 _run_channel_conversation(app, uid, text, atts, channel):
  个人微信(wechat)与企业微信(wecom)同核心、各一张会话 task(企微 binding 也存 chat_task_id)。
- 新增 core/wechat/wecom_crypto.py:WXBizMsgCrypt 等价(SHA1 验签 + AES-256-CBC 解密 + corpid 校验);
  与 crypto.py 的 Fernet 列加密、wecom.py 出站 API 全无关。
- service.py:get_user_by_wecom_userid 回调反查身份 + get/set_wecom_chat_task;
  upsert_wecom_binding 改成合并 config(不再覆盖 chat_task_id)。
- web/app.py:GET/POST /v1/wecom/callback(无 JWT,身份从加密 XML FromUserName 反查)。
- env:WECOM_CALLBACK_TOKEN / WECOM_CALLBACK_AESKEY;暂只收文本,未绑定/空消息静默。
- 文档:PROGRESS/RUN/DESIGN/wecom 同步(DESIGN 把「只做推送不做对话」旧决策标为演进)。

crypto round-trip 自测过;create_app + 路由注册 + 全量 import 通过。端到端待企微后台配回调 URL(需公网 HTTPS)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 12:07:47 +08:00
caoqianming 5d3cd88e2c fix(wecom): 扫码绑定改用扫码授权登录端点,修复「请在企业微信客户端打开链接」+ bump 0.26.10
oauth_authorize_url 原用 open.weixin.qq.com/connect/oauth2/authorize(网页授权,
只能在企业微信客户端内打开),桌面浏览器 window.open 它 → 企业微信报「请在企业微信
客户端打开链接」,扫不了码。

改用扫码授权登录端点 login.work.weixin.qq.com/wwlogin/sso/login(login_type=CorpApp),
桌面浏览器渲染二维码,企业微信 App 扫码确认后回跳带 code,verify_state / get_user_id
逻辑不变。前置:redirect_uri 域名须配在应用「企业微信授权登录」可信域名(另一项设置)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 11:21:30 +08:00
caoqianming 8ab1805df4 fix(wechat): wechat_push 工具漏挂企业微信 + 提取 active_channels 单一真相源 + bump 0.26.9
根因:wechat_push_available() 只看 clawbot_enabled(),没算企业微信。线上若只开
企微渠道(ClawBot 开关没开)→ 工具压根不注册到 agent → zcbot 照实回"没有直接
发企业微信的工具",用户已绑企微仍推不出。底层 send_to_user 早支持 push_wecom,
纯属注册门槛漏判。

修:提取 service.active_channels() 作渠道清单唯一真相源,门槛(wechat_push_available)
与投递(send_to_user)都引它,加渠道只改一处,根除"两处各列各的"这类偏差。
工具描述把 ~24h 窗口注明为 ClawBot-only(企业微信无窗口约束)。

纯内部重构,对外契约不变;test_secret_host_tools 8/8 过。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 11:09:21 +08:00
caoqianming 23c5ab20e0 feat(web): main.py web 支持 --ssl-certfile/--ssl-keyfile(uvicorn 原生 TLS,免 nginx)+ bump 0.26.8
两者同时给即在本端口跑 HTTPS,只给其一报错;都不给=明文(向后兼容)。
适配「只有 8765 对外」场景:zcbot 直接在 8765 上 HTTPS,不用 nginx/不挪端口。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 10:43:18 +08:00
caoqianming 0cf6e3e61e chore(wecom): 加企业微信可信域名校验文件 WW_verify_THssshZfneJwIG5Y.txt(放 repo 根)+ bump 0.26.7
bot.ctc-zc.com 的可信域名归属校验;配合 /WW_verify_{token}.txt 路由在域名根 serve。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 10:25:06 +08:00
caoqianming 36964d9920 feat(wecom): 域名根 serve 企业微信可信域名校验文件 WW_verify_*.txt + bump 0.26.6
GET /WW_verify_{token}.txt 从 ZCBOT_WECOM_VERIFY_DIR(默 repo 根)读同名文件返回,
公开端点 + token isalnum 防穿越。解企业微信「网页授权可信域名」归属校验
(zcbot 根路径原是 302 跳 SPA,验证文件 404)。配好可信域名才能配可信IP(修推送 60020)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 10:22:34 +08:00
caoqianming ed2ff52bf4 fix(wecom): diag_wecom 加 sys.path 仓库根 + 手动 .env 兜底(直跑不再 ModuleNotFoundError)+ bump 0.26.5
诊断已定位线上 60020:应用「企业可信IP」白名单未含服务器出口 IP。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 10:16:17 +08:00
caoqianming 6f7c904cca fix(wecom): 推送失败透出真实 errcode/errmsg + 加 diag_wecom 诊断脚本 + bump 0.26.4
之前推送失败只回 error:RuntimeError,吞了企业微信的 errcode。改成 reason=str(e)
(含 gettoken/message_send 失败的 errcode+errmsg);scripts/diag_wecom.py 分步查
gettoken vs send 的确切 errcode,服务器上直跑即可定位(可见范围/userid/凭据)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 10:10:56 +08:00
caoqianming c79fc8ef0c feat(wecom): 企业微信加「手填 userid」绑定(无 HTTPS 域名也能推)+ bump 0.26.3
企业微信推送是出站调用(gettoken/message_send 直连 qyapi),不需要域名;只有
OAuth 扫码拿 userid 那步要 HTTPS 可信域名。用户暂无域名 → 加第二条绑定路:
手填成员 userid(管理后台→通讯录→成员→「账号」)即可推送。

- web/app.py:`PUT /v1/wecom/bind/userid`(写绑定,wecom_configured 才允许)
- 前端 rail「微信」modal 企业微信段加输入框 + 保存(与扫码并列,已绑回填);
  refreshWecom 提示两路并存
- service/推送/send_to_user 不动(userid 来源换了,绑定数据结构一样)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 09:42:53 +08:00
caoqianming 9381655210 fix(admin): 近 7 天用量按日期倒序(最新一天在最上)+ bump 0.26.2
_usage_section 的 by_day_7d 排序 order_by(day) → order_by(day.desc())。
overview 趋势表 + PDF 报告共用此数据,两处都生效;前端纯按行渲染、不依赖升序,无需改 JS。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 09:33:54 +08:00
caoqianming f17da6a6e1 feat(auth): 平台登录注入 name/user_name + 监控页/dev 顶栏用户名展示 + bump 0.26.1
平台登录档案注入(0.26.0):
- users 加 name/user_name 两列(migration 0016,纯加 nullable 列,平滑兼容存量行)
- /v1/auth/login body 可选收 name/user_name,ensure_user_row 升级为 upsert
  (COALESCE(EXCLUDED, 旧值):平台传非空就刷新、传 null 不覆盖清空)
- login / login_password / /v1/me 响应回带 name/user_name/role

用户名展示(0.26.1):
- 统一兜底链 name → user_name → email → uid8,监控页与 dev 页共用
- 监控页 admin.js:各用户用量 / 存储 / overview 迷你表用户列走 userCellHTML,
  name+user_name 都有时主显 name + 浅灰 user_name;title 悬浮完整身份。
  admin.py 两表 SELECT 补 User.name/user_name
- dev 顶栏 main.js renderWho:默认显 name,hover 显账号/邮箱/ID;
  state.js 加 userUserName/userEmail + setIdentity/userDisplayName/userDisplayTitle helper,
  登录 / embed / /v1/me 校准共用

注:migration 0016 需在目标环境 `main.py db upgrade head` 应用后生效。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 09:31:32 +08:00
caoqianming 2b2b4531b3 fix(web): 登录失败提示统一为「账号或密码错误」,不再回显原始状态码 + bump 0.25.2
输错密码时前端弹「404」:后端 login_password 实际返 403,前置网关/旧构建
把状态改写成 404 后,doLogin 直接回显 r.status 导致语义错误。

- auth.js doLogin 失败分支:表单已校验非空,非 2xx 绝大多数是凭据不对,
  统一给「账号或密码错误」(pw)/「user_id 或 PLATFORM_KEY 错误」(key);
  仅 5xx 暴露状态码提示服务端问题。
- app.py:1399 detail 同步改中文,保持契约自洽。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 09:08:00 +08:00
caoqianming b5cfce72b5 feat(wechat): web 端微信 task 只读镜像,锚定微信为单一交互入口 + bump 0.25.1
web 端打开 channel=wechat 的常驻 task 原能正常发消息,但 web→微信单向
不同步(web 发的走通用端点 → _run_agent_bg,不经过 inbound loop 里
send_text 回微信那段,微信侧零感知);微信→web 则同步(同一条 task)。

不做双向打通:回微信需 context_token、只能从入站拿且 24h 过期,双向同步
会被该窗口拖成"有时同步"(不可预测)+ 两入口并发写歧义。改为 web 端只读
镜像,交互权威单一锚定微信;主动推走 wechat_push / 定时简报。

- chat.js: applyChannelComposerLock(selectTask 后调)对 wechat task 置
  chat-input readOnly + 改 placeholder 引导去微信 + 禁润色;sendMessage
  入口加 channel 守卫(Enter 兜底)
- dev.html: .readonly-locked 置灰样式

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 16:30:22 +08:00
caoqianming 529d7f1046 feat(wechat): 入站收图片/文件,CDN 下载+AES 解密落盘 + bump 0.25.0
get_updates 原只抽 text_item,图片/文件 item 被丢成空 text,inbound
又因空文本 continue → 用户发的图/文件静默丢弃、零落库(DB 实证)。

- ilink: InboundAttachment + 解析 image_item/file_item + download_media
  (CDN /c2c/download GET 密文 → AES-128-ECB 解,发送侧加密的逆);key 双
  编码兜底(base64(raw16)/base64(hex32)),图片按 magic bytes 补扩展名
- inbound: handle_message 契约加附件参,文本/附件都空才跳过,下载失败
  只丢该附件不拖垮整条
- app.py: 附件落盘 <wd>/inbound/,图片拼 [用户上传的参考图](走
  look_at_image)、文件拼 [用户上传的文件](走 Read/Shell),复用 web 端
  粘贴图约定,不碰模型链路

crypto roundtrip + 双编码 key decode 已单测;端到端(GET/POST、真实
image_item 结构)待用户重发一张图实测。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 16:15:52 +08:00
caoqianming 6f7e32bb33 docs: 操作说明书精简版表格微调 + 新增「科研AI双智能体·汇报PPT大纲」+ bump 0.24.4
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 14:57:57 +08:00
caoqianming 7b9f0c12ed refactor(wechat): 绑定表合一 channel_bindings(判别列+JSONB),取代 ClawBot/企微两表 + bump 0.24.3
架构复盘:渠道绑定 = "用户在某渠道的一份配置",各渠道字段形态不同 → 判别列 + JSONB 多态
(同本库 usage_events kind+units)最契合,加渠道(飞书/TG…)零 migration。原分表
(0012/0014)对 2 渠道够用但不扛增长、与库内多态范式不一致;单宽表(NULL 列并列)最差。

- models:`ChannelBinding(user_id, channel, status, config JSONB)` PK=(user_id,channel)
  取代 WeChatBotBinding/WeComBinding;clawbot 敏感字段 crypto 加密入 config,wecom 明文 userid。
- migration 0015:建表 + 旧两表数据搬进 config(token 密文串原样搬)+ drop 旧表;
  DDL+DML 同事务失败回滚不丢;含 down 拆回。
- service 存取改读写 config —— **公共 API + BindingSnapshot 形状不变** → inbound/web/tool/
  scheduler 零改动(纯内部数据层重构,对外行为不变)。趁绑定数据极少时合表最省。

import/编译 + _snap 反序列化单测过;DB 往返 + migration 待部署联调。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 14:55:39 +08:00
caoqianming 2dd1b49725 fix(web): 微信绑定弹框标题样式对齐其他弹框 + bump 0.24.2
#wechat-modal h3 只设了 flex,漏了 margin/padding/font-size/border-bottom,
吃浏览器默认 h3 样式导致标题又大又飘、无分隔线。补齐标题样式 +
h3 svg opacity + .sk-x 关闭按钮样式,与 crons/memory 弹框一致。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 14:42:57 +08:00
caoqianming 6008e1b8a0 fix(wechat,email): host-side 文件工具翻译容器路径,修复附件发不出 + bump 0.24.1
docker 模式下 fs 工具在容器跑,文件落宿主 users/<uid>/<wd>/,但 send_email /
wechat_push 是宿主进程工具:base_dir=cwd 且不识别容器↔宿主路径映射,agent 给的
相对路径拼到 cwd、容器绝对路径 /workspace/... 宿主上瞎解析,relative_to(user_root)
必越界 → 附件永远发不出(probe 直调 send_file 绕过解析,故"测试可发")。

- tools/base.py: 共享 _resolve_user_file(/workspace 前缀翻回 user_root + 相对拼
  base_dir + 越界校验)+ FileOutOfBounds
- agent_builder: 两个 host 工具 base_dir=working_dir_path(宿主 task 目录)而非 cwd
- send_email / wechat_bot: 改用 helper
- tests: 加 3 例回归(翻译+越界、send_email 容器路径、wechat_push 相对路径)
- scripts/diag_wechat_push.py: 诊断脚本

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 14:02:48 +08:00
caoqianming 193b545b75 feat(wecom): 企业微信渠道 B 纯推送 + OAuth 扫码绑 userid + bump 0.24.0
企业微信只做推送、不做对话(省回调 + AES + 5s ACK):无条件主动推(不挑活跃度、
无 24h 窗口),补 ClawBot 短板,定时简报必达首选。touser 经 OAuth 网页授权扫码拿成员 userid。

- core/wechat/wecom.py:access_token 2h 缓存(线程安全 + 失效重取)、OAuth getuserinfo、
  message/send text/file、media/upload、state HMAC 签名
- WeComBinding 模型 + migration 0014(0013 被 task_channel 占);service 加 wecom CRUD
  + push_wecom + send_to_user 接 wecom 一路(scheduler deliver_notify 经它自动带上)
- web/app.py 5 端点(/v1/wecom/oauth/url、callback 公开-身份从 state 验、bind GET/DELETE、test)
- 前端 rail「微信」modal 加企业微信段(wechat.js + dev.html)

激活(管理员):建自建应用 → WECOM_CORPID/AGENTID/SECRET + 配「网页授权可信域名」;
db upgrade head(带 0014)。redirect 主机取 ZCBOT_PUBLIC_BASE_URL 或请求 base。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 13:44:23 +08:00
caoqianming 320f428dd3 feat(email): 配置 foxmail SMTP 发信 + 发件人显示名品牌化 + bump 0.23.2
- .env 填入 smtp.qq.com:25/STARTTLS/授权码,send_email tool 与定时任务
  notify 兜底投递生效(.env 不入库)
- send_email.py 发件人显示名由硬编码 zcbot 改读 SMTP_FROM_NAME,默认
  「总院科研辅助智能体」,对外不暴露内部代号
- RUN.md 补 SMTP_FROM_NAME 说明;PROGRESS 记一条

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:31:17 +08:00
caoqianming 340786a42f feat(web): 微信任务徽章改品牌绿 + 微信 logo + 整行绿边 + bump 0.23.1
上版徽章复用 .badge.active(蓝灰)与旁边「进行中」状态徽章撞色、不显眼。
新增 .badge.wx(微信绿 #07C160 + 白字 + 内嵌微信 logo SVG)与 .task-row.wx
(绿色左边框 + 极淡绿底 + hover 加深),让置顶的微信任务从普通任务里跳出来。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:23:34 +08:00
caoqianming 85336ccb7e feat(wechat): 微信对话 task 渠道标记 + 列表置顶(channel 字段)+ bump 0.23.0
tasks 加 channel 列(web/wechat,migration 0013,server_default='web' 回填存量,
并把 description='(微信 ClawBot 对话)' 的存量 task backfill 成 wechat)。微信常驻
task 后端强制置顶(列表查询前置 case pin 表达式,跨分页稳定),前端任务名前打绿色
「微信」徽章一眼可辨。channel 仅 INSERT 写定,后续 upsert/save 不传不覆盖。

- core/storage/models.py: Task.channel 列
- db/migrations/.../0013_task_channel.py: 加列 + backfill
- core/storage/utils.py: ensure_local_task_row 加 channel 参数
- web/app.py: 微信建 task 传 channel=wechat;_task_dict 透出;列表 pin 置顶
- web/static/js/chat.js: channel===wechat 打徽章

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 11:12:16 +08:00
caoqianming 95857ba687 feat(wechat): 绑定 UI 并入主 SPA(左栏 rail「微信」按钮 + 扫码 modal)+ bump 0.22.2
上版绑定页是独立 /static/wechat_bind.html、主界面没入口、用户找不到。集成:
rail 加「微信」按钮(hd-wechat)→ 扫码绑定 modal(wechat-modal),复用 api()
调已有 5 端点(起码/轮询/查/解绑/自检),仿 crons.js 范式;二维码过期自动换码。
独立页 wechat_bind.html 保留作嵌入/兜底入口。

文件:web/static/js/wechat.js(新)、dev.html(rail 按钮 + modal + CSS)、
main.js(import 触发顶层绑定 + Esc 关闭);RUN/PROGRESS 同步去掉"未并入 SPA"。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 09:47:34 +08:00
caoqianming c569438d5f fix(web): 顶栏 token 计量栏回复后不刷新 + bump 0.22.1
提问→助手答完后,对话顶栏「总 token·缓存命中·花费」停在发问前旧值,
要切走再切回才更新。根因:计量栏读 state.taskMeta,而它只在 selectTask
里重拉;SSE 收尾的 fetchSse finally 只刷列表+消息,从未重拉 meta。
修:finally 里当收尾的是当前可见 task 时补一次 GET /v1/tasks/{id} →
重置 state.taskMeta → renderChatMeta(),失败 try/catch 吞掉不打断收尾。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 09:08:19 +08:00
caoqianming 528b974d9f feat(wechat): ClawBot 个人微信接入第一期(后端 + 绑定页)+ 双渠道设计 §8.7 + bump 0.22.0
把 zcbot 送进用户个人微信:对话 + 主动推送(简报/结果)。选官方微信 ClawBot
(iLink Bot API,零封号)先行;企业微信作渠道 B 留接口。协议全程真机实测
(scripts/probe_clawbot*.py,本人微信号在灰度内)。

核心(后端 import/编译自测过):
- core/wechat/{ilink 协议客户端, crypto 凭据加密, service 绑定CRUD+24h窗口推送
  +send_to_user 渠道抽象, inbound 长轮询管理器+回复提取}
- WeChatBotBinding 模型 + migration 0012;tools/wechat_bot.py WechatPushTool
  + agent_builder 注册(有开关才挂)
- scheduler.deliver_notify 加 wechat 通道(未送达退邮件);web/app.py lifespan
  起入站管理器 + _run_wechat_message 回调 + 5 端点;web/static/wechat_bind.html 绑定页

实测要点:每条 sendmessage 必带唯一 client_id(漏则同 token 后续被丢);context_token
24h 可复用→主动推(需用户先开口);文件 getuploadurl→AES-128-ECB(PKCS7)→CDN
(URL 带 filekey)→file_item,docx/pdf 原生直推。

激活:db upgrade head(带 0012)+ env ZCBOT_WECHAT_BOT_ENABLED=1
+ ZCBOT_WECHAT_SECRET_KEY=<串>。待办:部署端到端联调、SPA 集成绑定 UI、企业微信渠道 B。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 08:59:56 +08:00
caoqianming 336db63a01 feat(rendering): 平台渲染层 rendering/ 统一三 skill docx + chromium md→pdf + bump 0.21.0
渲染从「各 skill 自带 render_docx.py」抽成平台能力:新建顶层 rendering/ 包,
bind-mount 进 /sandbox/rendering,各 skill 调 render.py 不再 bundle 渲染脚本
(符合 Skills 自包含/可 fork 标准,跨 skill import 会破坏 fork 故不走 skills/_shared)。

- common.py 叶子原语单一事实源(化学式白名单 CHEM_RE 原先三份逐字重复→收敛一处)
- docx_manuscript.py paper/proposal 配置化双 profile;docx_brief.py brief 富渲染复用 common
- pdf.py md→HTML→沙盒 chromium --print-to-pdf(不用 weasyprint:要 pango/cairo 原生库且不在镜像)
- render.py 统一 CLI --profile {brief,paper,proposal} --format {docx,pdf}

零回归:三 profile 重构前后 docx 解包 diff word/document.xml 字节完全一致。
守护测试 tests/test_rendering.py 5 项全过。chromium 冒烟 deploy/sandbox/probe_chromium_pdf.sh。

删 3 份 render_docx.py + 短命 skills/_shared/render_pdf.py;改 5 个 SKILL.md 调用到
render.py + 补反模式"渲染一律调 render.py、禁止手搓 weasyprint/pip 装包";brief 另删
research 索引滞后描述。requirements 加 markdown,pool.py 加 rendering 挂载。

部署须一次原子激活:/sandbox/rendering 挂载靠 pool.py(restart 重建容器生效)+
markdown 进镜像靠 requirements 触发整体重建——update.sh build→restart 顺序覆盖,
旧 render_docx 路径已删,勿只推代码不重建。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 13:07:19 +08:00
caoqianming d412aa6b24 fix(web): 消息目录点第一个圆点误高亮第二个 + bump 0.20.4
跳转锚点(block:center)与活跃判定锚点(顶线 80px)不一致:第一轮
上方无内容无法居中、被钉到顶端,短轮时下一轮卡片顶也落进 80px 带内
→ 越界高亮第二个圆点。改跳转为 block:start(同锚点)+ .msg
scroll-margin-top:16px,活跃容差 80→24 对齐。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 08:56:02 +08:00
caoqianming 247a887cd6 fix(web): 定时弹窗 z-index 遮挡 + 登录 focus 引用错 id + bump 0.20.3
- #crons-modal 漏了 z-index,退回基础 .modal(无 z-index)被 z-index:5
  的侧栏/面板盖住("弹了但被遮挡");补 z-index:112 与 #skills-modal/
  #memory-modal 对齐。排查用 node+DOM mock 跑通整条前端模块图确认 hd-crons
  绑定确实执行,定位到纯 CSS 层叠问题。
- main.js:106 $("li-token").focus() 引用了不存在的输入框(实际 li-email),
  未登录 boot 末尾会抛 TypeError;改为 li-email。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 10:13:13 +08:00
caoqianming c55d0d11f0 fix(context): 发送期补齐悬空 tool_calls,断中断 run 留下的协议崩 + bump 0.20.2
run 在写入 assistant.tool_calls 之后、tool 结果写库之前被中断(上游流式断连 /
用户取消 / 崩溃),历史里留下一条 tool_calls 后面没有对应 tool 结果的消息;用户
随后继续发言,下一轮原样发给 DeepSeek/OpenAI 即被拒(must be followed by tool
messages),任务卡死在 run_status=error(监控页排查 task 5c5d6d25 实测)。

prepare_messages_with_stats 入口(早返回分支之前)新增 _repair_dangling_tool_calls:
对每条 assistant.tool_calls 扫描紧随其后的 tool 结果,为缺失的 tool_call_id 补占位
tool 消息。纯发送期不改库 → 覆盖所有中断路径 + 存量坏数据自愈,stats 计 repaired_tool_calls。
区别于 06-06/06-12 的 arguments 损坏修复(那治参数投毒,此为结构性悬空)。

新增 4 个单测,context 套件 14 项全过。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 15:08:20 +08:00
12046 changed files with 183500 additions and 4211 deletions

8
.gitignore vendored
View File

@ -52,3 +52,11 @@ col.ps1
# brief skill 临时样例输出 (可由 skill 重新生成, 不入库) # brief skill 临时样例输出 (可由 skill 重新生成, 不入库)
.brief_out/ .brief_out/
# ClawBot 接入探测临时产物 (二维码图 / 测试文件, 探测时重新生成, 不入库;
# 探测脚本 scripts/probe_clawbot*.py 保留作参考与复测)
scripts/clawbot_qr*.png
scripts/zcbot_filetest.txt
# 诊断脚本的使用即弃 dump 输出(diag_*.py 写本地,不入库)
scripts/_*.txt

147
DESIGN.md
View File

@ -31,6 +31,7 @@ zcbot/
│ ├── skills.py # SkillRegistry(Anthropic 渐进披露) │ ├── skills.py # SkillRegistry(Anthropic 渐进披露)
│ ├── task.py # TaskState │ ├── task.py # TaskState
│ ├── memory.py # per-user .memory/ 双层记忆 │ ├── memory.py # per-user .memory/ 双层记忆
│ ├── shortcuts.py # 快捷指令(触发词→完整指令,入口层确定性展开;.memory/shortcuts.md)
│ ├── paths.py # task_dir db form 归一(to_db_path / from_db_path) │ ├── paths.py # task_dir db form 归一(to_db_path / from_db_path)
│ ├── storage/{engine,models,utils}.py # SQLAlchemy 2.x ORM │ ├── storage/{engine,models,utils}.py # SQLAlchemy 2.x ORM
│ └── agent_builder.py # 装配 lib:build_agent / system prompt / validate_task_name │ └── agent_builder.py # 装配 lib:build_agent / system prompt / validate_task_name
@ -118,6 +119,8 @@ yaml 是手填的,probe 用真实调用对账:`basic_chat` / `parallel_tools` /
**前端记忆面板 = 只读窗口,"改"全走对话(取舍)**:web 左栏「记忆」按钮开只读 modal,直接读 FS 渲染全貌(`GET /v1/memory` 全貌 + `GET /v1/memory/extended/{filename}` 单篇),**故意不提供写/删 API**。理由:① "看全貌"是读、不是 operation —— 走 LLM 反而又贵又只能拿到转述,看地面真相必须直读 FS;② "改"走对话(agent 自管,上文契约)= 单一写入口、自然语言、能合并改写,且用户不会写坏 frontmatter。对照业界:Claude(同为文件式记忆)给全套 view+edit;ChatGPT/Gemini 黑箱式只给看/删、长期不支持内联编辑。我们取"GUI 当眼睛、模型当手":既守住文件式记忆的透明卖点,又不引第二套写代码。后续若"删一条 / prune 臃肿 core.md"这类确定性精确操作摩擦明显,再单加直接的 delete(delete 是唯一廉价且确定性强、值得直连的 mutation,同 ChatGPT 做法)。路径穿越校验收口在 `core/memory.py`(只许 `.memory/extended/` 下扁平 `.md` + resolve 子树兜底)。 **前端记忆面板 = 只读窗口,"改"全走对话(取舍)**:web 左栏「记忆」按钮开只读 modal,直接读 FS 渲染全貌(`GET /v1/memory` 全貌 + `GET /v1/memory/extended/{filename}` 单篇),**故意不提供写/删 API**。理由:① "看全貌"是读、不是 operation —— 走 LLM 反而又贵又只能拿到转述,看地面真相必须直读 FS;② "改"走对话(agent 自管,上文契约)= 单一写入口、自然语言、能合并改写,且用户不会写坏 frontmatter。对照业界:Claude(同为文件式记忆)给全套 view+edit;ChatGPT/Gemini 黑箱式只给看/删、长期不支持内联编辑。我们取"GUI 当眼睛、模型当手":既守住文件式记忆的透明卖点,又不引第二套写代码。后续若"删一条 / prune 臃肿 core.md"这类确定性精确操作摩擦明显,再单加直接的 delete(delete 是唯一廉价且确定性强、值得直连的 mutation,同 ChatGPT 做法)。路径穿越校验收口在 `core/memory.py`(只许 `.memory/extended/` 下扁平 `.md` + resolve 子树兜底)。
**快捷指令 ≠ memory(两种机制,别混)**(`core/shortcuts.py`):触发词 → 完整指令的映射,存 `.memory/shortcuts.md`(`| 触发词 | 指令 |` 两列 md 表)。**关键区别**:memory 是注上下文、给模型**概率召回**的软上下文;快捷指令是入口层、模型跑之前的**确定性替换** —— 每条入站消息先经 `shortcuts.expand(ws, uid, text)` 整条 `strip()+casefold()` 精确匹配,命中即把文本换成完整指令再跑 agent(与「新话题」魔法命令同风格,"帮我出个简报"不误伤)。取舍:① **性能** —— shortcuts.md **内容永不注上下文**(触发靠入口层查表,不靠模型),存再多条平时上下文也是 0,触发时进上下文的就是那条完整指令本身(= 用户本来要打的字),无额外 token;若反过来把它塞进 core.md 让模型概率召回,则既不确定、又每轮烧 token,正是本设计要绕开的坑。② **渠道无关** —— `expand` 在渠道核心 `_run_channel_conversation`(微信/企业微信)与网页 `post_message` 两处共用,任意入口打同一触发词行为一致。③ **维护复用 memory 心智** —— 存储蹭 `.memory/` per-user 壳(agent 已有写权限),`memory_block` 加一行契约让模型在用户说"记个快捷词 X→Y"时写 shortcuts.md;但这行契约只讲"能维护 + 格式",不注文件内容。故:**存储借 memory 的壳,触发逻辑独立且确定**。
--- ---
## 4. 模型路由 ## 4. 模型路由
@ -306,10 +309,10 @@ done {}
### 7.3 认证 ### 7.3 认证
**当前形态(D' 过渡)**:两条 login 路径签**同款 JWT**(HS256,`JWT_SECRET` env 签,默 7d TTL): **当前形态(D' 过渡)**:两条 login 路径签**同款 JWT**(HS256,`JWT_SECRET` env 签,默 7d TTL):
- `POST /v1/auth/login {user_id, platform_key}` — platform 服务端机器对机器入口,持 `PLATFORM_KEY` 共享密钥可为任意 user_id 签 token(等同 user 身份由 platform 注入) - `POST /v1/auth/login {user_id, platform_key, name?, user_name?}` — platform 服务端机器对机器入口,持 `PLATFORM_KEY` 共享密钥可为任意 user_id 签 token(等同 user 身份由 platform 注入)。body 可选带 `name`(显示名)/ `user_name`(平台账号名),`ensure_user_row` upsert 落 `users.name/user_name`(`COALESCE(EXCLUDED, 旧值)`:平台传非空就刷新、同步平台侧改名,传 null 不覆盖);响应回带 `{name, user_name, role}`。缺省即旧行为(只填 user_id),向后兼容老调用方。与未来 OIDC 的 `name/preferred_username` claim 注入同构
- `POST /v1/auth/login_password {email, password}` — dev SPA / 同事试用,`users.email` UNIQUE + bcrypt 校验 `password_hash`;`main.py user add` CLI 发用户 - `POST /v1/auth/login_password {email, password}` — dev SPA / 同事试用,`users.email` UNIQUE + bcrypt 校验 `password_hash`;`main.py user add` CLI 发用户
- `POST /v1/auth/change_password {old_password, new_password}` — dev SPA 顶栏自助改密,需 Bearer(user_id 从 JWT 取,不信前端);验旧密码 + bcrypt 重哈希;platform_key 入口建的无密码行不可改(403) - `POST /v1/auth/change_password {old_password, new_password}` — dev SPA 顶栏自助改密,需 Bearer(user_id 从 JWT 取,不信前端);验旧密码 + bcrypt 重哈希;platform_key 入口建的无密码行不可改(403)
- `GET /v1/me` — 返 `{user_id, role}`(role 走 DB 查),dev SPA 据此决定显不显"管理"入口 - `GET /v1/me` — 返 `{user_id, role, name, user_name, email}`(走 DB 查),dev SPA 据 role 决定显不显"管理"入口,据 name/user_name/email 渲顶栏用户名(默认 name,hover 显账号 / 邮箱)。两条 login 响应同样回带 name/user_name(平滑展示,登录即有名,/v1/me 再校准)
- `GET /v1/admin/*` — 管理后台,`Depends(require_admin)`(验 JWT + `users.role=='admin'`,否则 403)。`/v1/admin/overview` 返固定指标(runtime/tasks/users/usage 总用量+近7d趋势,供轮询);`/v1/admin/usage/models?range=&sort=`、`/v1/admin/usage/users?range=&sort=&page=&page_size=`、`/v1/admin/storage/users?page=&page_size=` 是带时间筛选(all/7d/30d)/ 排序(cost/tokens)/ 分页的独立表端点。独立页 `/static/admin.html`(目录导航 + 客户端打印导出 PDF)。后续续挂建用户/改角色/配置等管理动作 - `GET /v1/admin/*` — 管理后台,`Depends(require_admin)`(验 JWT + `users.role=='admin'`,否则 403)。`/v1/admin/overview` 返固定指标(runtime/tasks/users/usage 总用量+近7d趋势,供轮询);`/v1/admin/usage/models?range=&sort=`、`/v1/admin/usage/users?range=&sort=&page=&page_size=`、`/v1/admin/storage/users?page=&page_size=` 是带时间筛选(all/7d/30d)/ 排序(cost/tokens)/ 分页的独立表端点。独立页 `/static/admin.html`(目录导航 + 客户端打印导出 PDF)。后续续挂建用户/改角色/配置等管理动作
后续 `Authorization: Bearer <jwt>` 走所有 /v1/tasks*,FastAPI `Depends(require_user)` 验签 → 提取 user_id → SELECT/UPDATE 全带 `Task.user_id == user_id` 条件做隔离。`/v1/admin/*` 在 `require_user` 基础上再叠一层 `users.role=='admin'` 检查(`make_require_admin`)。`PLATFORM_KEY` / `JWT_SECRET` 任一缺失 → app 启动 fail-fast。 后续 `Authorization: Bearer <jwt>` 走所有 /v1/tasks*,FastAPI `Depends(require_user)` 验签 → 提取 user_id → SELECT/UPDATE 全带 `Task.user_id == user_id` 条件做隔离。`/v1/admin/*` 在 `require_user` 基础上再叠一层 `users.role=='admin'` 检查(`make_require_admin`)。`PLATFORM_KEY` / `JWT_SECRET` 任一缺失 → app 启动 fail-fast。
@ -322,6 +325,12 @@ done {}
```sql ```sql
users(user_id uuid pk, email text null unique, password_hash text null, oidc_subject null, plan null, users(user_id uuid pk, email text null unique, password_hash text null, oidc_subject null, plan null,
-- plan:模型档位名(0001 起就有列,0.31 起启用;之前休眠)。值是 config/agent.yaml
-- model_tiers 的 key(如 'pro');NULL/未知 → 落 'default' 档。控制该用户能用哪些模型,
-- 详见 core/model_access.py。role=admin 始终全开,不受档位限制。无需 migration。
name text null, user_name text null, -- 0016:平台登录注入的档案(显示名 / 平台账号名);
-- platform_key 入口 ensure_user_row upsert 写,
-- 邮箱密码 / 历史行留空。未来 OIDC claim 注入同构
role text not null default 'user', -- 0009:user/admin;admin 才能访问 /v1/admin/* 管理后台 role text not null default 'user', -- 0009:user/admin;admin 才能访问 /v1/admin/* 管理后台
created_at) created_at)
-- email UNIQUE (0005);NULL 不冲突,允许 platform_key 入口 user 共存 -- email UNIQUE (0005);NULL 不冲突,允许 platform_key 入口 user 共存
@ -334,6 +343,8 @@ users(user_id uuid pk, email text null unique, password_hash text null, oidc_sub
tasks(task_id uuid pk, user_id fk, name text not null, working_dir text not null, skill, description, tasks(task_id uuid pk, user_id fk, name text not null, working_dir text not null, skill, description,
status, model_profile, tokens_prompt, tokens_completion, cost_usd, status, model_profile, tokens_prompt, tokens_completion, cost_usd,
channel text not null default 'web', -- web/wechat 渠道来源(0013);仅 INSERT 写定,
-- upsert/save 不传不覆盖。前端据此打徽章 + 列表强制置顶
run_status text not null default 'idle', -- idle/running/cancelling/error(0004 合 runs 表) run_status text not null default 'idle', -- idle/running/cancelling/error(0004 合 runs 表)
run_error text null, run_error text null,
created_at, updated_at); created_at, updated_at);
@ -558,7 +569,7 @@ create index on usage_events (model_profile, created_at);
**选型**:Context Editing + Memory/File State + Cache Observability 混合。稳定 system/tools 前缀利于 provider cache;旧 tool result 移除或压缩;关键发现写 task summary / FS,需要时 `read` 重新拉。长上下文保留作少数全局推理的临时能力,非默认每轮成本。 **选型**:Context Editing + Memory/File State + Cache Observability 混合。稳定 system/tools 前缀利于 provider cache;旧 tool result 移除或压缩;关键发现写 task summary / FS,需要时 `read` 重新拉。长上下文保留作少数全局推理的临时能力,非默认每轮成本。
**落地形态**:`core/context.py` 发送前压缩旧 tool / `load_skill` / assistant tool_call arguments(保 `role/tool_call_id/name` 协议完整),不改持久化历史;**上下文压力门槛**(2026-06-10):总 chars 未逼近上限则完全跳过压缩、原样发,护 DeepSeek 前缀缓存(短任务字节逐轮一致、命中 92-94%)。task summary(旧消息压成一条、区分硬约束/计划/文件路径/关键事实)为第二步,未做。 **落地形态**:`core/context.py` 发送前压缩旧 tool / `load_skill` / assistant tool_call arguments(保 `role/tool_call_id/name` 协议完整),不改持久化历史;**上下文压力门槛**(2026-06-10):总 chars 未逼近上限则完全跳过压缩、原样发,护 DeepSeek 前缀缓存(短任务字节逐轮一致、命中 92-94%)。task summary(旧消息压成一条、区分硬约束/计划/文件路径/关键事实)为第二步,未做 —— 已并入 §8.8 Phase 2(对齐 Hermes 结构化摘要)统一推进。channel 常驻会话的无限累积另由 §8.8 软重置分段治理(本节压缩挡不住跨时段累积)
### 8.3 PPTX 前端在线预览(2026-06-09,✅ 已落地 Stage 1) ### 8.3 PPTX 前端在线预览(2026-06-09,✅ 已落地 Stage 1)
@ -611,6 +622,10 @@ create index on usage_events (model_profile, created_at);
**数据模型(新表 `scheduled_jobs`,独立加表不碰现有 schema → 公测兼容)**: **数据模型(新表 `scheduled_jobs`,独立加表不碰现有 schema → 公测兼容)**:
`id, user_id, name, prompt, cron, tz(默 Asia/Shanghai), mode(isolated|persistent), bound_task_id(可空), notify(JSONB 可空), enabled, timeout_seconds, next_run_at, last_run_at, last_status, last_error, last_task_id, consecutive_failures, expires_at(可空), created_at, deleted_at`。Alembic 加表 migration;`usage_events` 复用现成记账(可加 `kind="scheduled"` 自由文本区分,无需 migration)。 `id, user_id, name, prompt, cron, tz(默 Asia/Shanghai), mode(isolated|persistent), bound_task_id(可空), notify(JSONB 可空), enabled, timeout_seconds, next_run_at, last_run_at, last_status, last_error, last_task_id, consecutive_failures, expires_at(可空), created_at, deleted_at`。Alembic 加表 migration;`usage_events` 复用现成记账(可加 `kind="scheduled"` 自由文本区分,无需 migration)。
**mode 语义(澄清)**:mode 只决定"对话是否延续"——isolated 每次新建 task(隔离对话历史、省 token),persistent 复用 `bound_task_id` 常驻 task(跨天连续性)。**文件夹两种模式都按 job 复用**(`scheduled-<jobid>`,产物累积 + notify 取最新产物依赖它),不是 mode 的区分维度。
**定时执行 task 的归属与可见性(0017)**:定时任务产生的 task 在 `tasks` 上标 `scheduled_job_id`(nullable FK → `scheduled_jobs.job_id`)。普通对话列表 `WHERE scheduled_job_id IS NULL` 排除(不混进"用户项目"列表);crons 页可按 job 反查执行历史。push 投递记录见 §8.7。
**守护循环(仿 §8.4 `_disk_scanner`,plain-asyncio)**:lifespan 起一个后台 task,每 ~10s(`ZCBOT_SCHEDULER_TICK_SECONDS`,只决定最坏延迟≤1tick、不决定会否漏 —— claim 取 `next_run<=now` 的全部)扫 `enabled AND next_run_at<=now()`;命中即 `asyncio.create_task(asyncio.to_thread(_run_agent_bg, ...))` 复用现成路径,登记到 `app.state.inflight`(随关停 drain 一起收尾)。与**单活 run 锁**(§7.x `run_status` + `SELECT FOR UPDATE`)交互:isolated 每次新 task 天然无冲突;persistent 若绑定 task 正忙 → 跳过本次 + 记 warn,下一个点再来(不排队堆积)。run 完回写 `last_*` + croniter 算 `next_run_at` **守护循环(仿 §8.4 `_disk_scanner`,plain-asyncio)**:lifespan 起一个后台 task,每 ~10s(`ZCBOT_SCHEDULER_TICK_SECONDS`,只决定最坏延迟≤1tick、不决定会否漏 —— claim 取 `next_run<=now` 的全部)扫 `enabled AND next_run_at<=now()`;命中即 `asyncio.create_task(asyncio.to_thread(_run_agent_bg, ...))` 复用现成路径,登记到 `app.state.inflight`(随关停 drain 一起收尾)。与**单活 run 锁**(§7.x `run_status` + `SELECT FOR UPDATE`)交互:isolated 每次新 task 天然无冲突;persistent 若绑定 task 正忙 → 跳过本次 + 记 warn,下一个点再来(不排队堆积)。run 完回写 `last_*` + croniter 算 `next_run_at`
**croniter 选型**:存标准 5 段 cron 串 + 时区,`croniter` 算 `next_run_at`。理由:正确处理 dom/dow 同列的 vixie OR 语义和时区折算(手搓极易踩坑,四源都点名这个坑);纯 Python 小依赖。劣选:只支持"每天/每周 HH:MM"自己用 datetime 算 —— 零依赖但遇复杂周期要返工。 **croniter 选型**:存标准 5 段 cron 串 + 时区,`croniter` 算 `next_run_at`。理由:正确处理 dom/dow 同列的 vixie OR 语义和时区折算(手搓极易踩坑,四源都点名这个坑);纯 Python 小依赖。劣选:只支持"每天/每周 HH:MM"自己用 datetime 算 —— 零依赖但遇复杂周期要返工。
@ -642,6 +657,132 @@ create index on usage_events (model_profile, created_at);
**前端取舍(2026-06-18 定 + 落地):对话端做完整 CRUD,前端只读展示 + 停用/删除。** 前端 SPA 调 `/v1/*` REST、不经 agent → "界面建/改定时任务"必须另开 REST + 表单 + cron 构建器(整套最重的是让科研用户填 cron 的 UX)。既然产品本就是对话式 agent,把建/改/删/查全收到对话(`schedule_*` 工具),**前端退化成只读看板**:`GET /v1/schedules` 列表 + 列表项「停用/删除」两个高频便捷动作(`PATCH`/`DELETE /v1/schedules/{id}`)。好处:cron 构建器 UX 难题直接消失(用户从不在前端填 cron,对 bot 说"每天早九点"由模型翻译);无"前端改了和对话不同步"的状态问题。代价:界面不能新建/编辑(需求低频,且对话更自然)。落地:`web/static/js/crons.js` 只读 master-detail modal(复用 skills modal 范式)+ 左栏 rail「定时」入口;工具与 REST 共用 `core.scheduler` CRUD 服务层不漂移。 **前端取舍(2026-06-18 定 + 落地):对话端做完整 CRUD,前端只读展示 + 停用/删除。** 前端 SPA 调 `/v1/*` REST、不经 agent → "界面建/改定时任务"必须另开 REST + 表单 + cron 构建器(整套最重的是让科研用户填 cron 的 UX)。既然产品本就是对话式 agent,把建/改/删/查全收到对话(`schedule_*` 工具),**前端退化成只读看板**:`GET /v1/schedules` 列表 + 列表项「停用/删除」两个高频便捷动作(`PATCH`/`DELETE /v1/schedules/{id}`)。好处:cron 构建器 UX 难题直接消失(用户从不在前端填 cron,对 bot 说"每天早九点"由模型翻译);无"前端改了和对话不同步"的状态问题。代价:界面不能新建/编辑(需求低频,且对话更自然)。落地:`web/static/js/crons.js` 只读 master-detail modal(复用 skills modal 范式)+ 左栏 rail「定时」入口;工具与 REST 共用 `core.scheduler` CRUD 服务层不漂移。
### 8.6 平台渲染层 rendering/(2026-06-23,✅ 已落地)
**心智:文档渲染(md→docx/pdf)是平台能力,不是 skill 内容。** 像 `chromium` / `document_search` / `python` 一样,skill **调用**它而非各自 bundle 一份。
**起因**:`_CHEM_RE` 化学式下标白名单在 brief/paper/proposal **三份 render_docx.py 逐字重复**(改一处易漏改),patent/standard 还复用 proposal 那份;且 brief 缺 PDF 路径,模型临场手搓 weasyprint + 运行时 pip(线上事故)。
**为什么不放 `skills/_shared/` 让各 skill `import`**:Skills 走 Anthropic 自包含/渐进披露/可 fork bundle 标准(§3.5),`fork_skill` 把内置 skill 整份拷到用户 `.skills`。跨 skill `import skills._shared` 会破坏 fork(用户拷贝里 import 不到内置树)且 sys.path 脆。故抽到**顶层 `rendering/` 平台包**,bind-mount 进 `/sandbox/rendering`(pool.py,与 skills 同款 `:ro`),与 skill bundle 正交。
**结构**:`common.py`(叶子原语单一事实源:字体 OOXML/`CHEM_RE`/块级正则/表格行切分/图片路径)+ `docx_manuscript.py`(paper 投稿稿 + proposal 申报书,配置化双 profile:页边距/TOC/图题前缀/列表模式/分页策略)+ `docx_brief.py`(brief 简报富渲染:商务红 + 引文上标超链 + callout,复用 common 叶子)+ `pdf.py`(md→HTML→沙盒 chromium `--print-to-pdf`,复用 `common.CHEM_RE`)+ `render.py`(统一 CLI `--profile {brief,paper,proposal} --format {docx,pdf}`)。各 skill SKILL.md 调 `python /sandbox/rendering/render.py`,不再自带 render_docx.py。
**PDF 用 chromium 不用 weasyprint**:chromium 镜像已装(给 mermaid),fonts-noto-cjk 已装,完整浏览器内核 CSS 保真度高;weasyprint 要 pango/cairo 原生库、不在仓库 Dockerfile。**与 §8.3 pptx 预览分工**:pptx 预览在 web host 调 LibreOffice(面向用户的高保真预览,不进沙盒);本层在沙盒内 chromium 渲染(agent 生成阶段产出 docx/pdf 交付物)。
**取舍**:重构对三 profile 各渲前后 diff `word/document.xml` **字节一致**(零回归);brief 不强并进 manuscript 路径(引文/配色差异大,只共用叶子原语,降回归面)。
### 8.7 微信接入(双渠道:ClawBot 个人微信 + 企业微信自建应用)(2026-06-23 设计,status=design)
**诉求**:把 zcbot 送进用户**个人微信**——简报/任务结果主动推过来,且能在微信里直接跟它对话。用户体感 = 微信通讯录里多一个叫「微信 ClawBot」的**联系人**,像加了个好友一样聊。
> **⚠️ 实测结论(2026-06-23,`scripts/probe_clawbot*.py`,真机端到端;关键是 `client_id`):ClawBot 可双向对话 + 可主动推送(有前提)。**
> ① 灰度可用(扫码 `confirmed``bot_token` + `baseurl`);② **入站通**(`getupdates` 长轮询收用户消息,带 `from_user_id` + `context_token`);③ **多条/流式回复成立**——同一 `context_token` 连发多条,**每条 `msg` 必须带唯一 `client_id`**(漏它则只有第一条送达——前几轮误判"单条/纯被动"的真因),中间块 `message_state=1`(GENERATING)、末块 `=2`(FINISH),按 ~1000 字分块、各块间隔 ~300ms;④ **主动推送成立**——发完 FINISH 后隔 30s 复用同一 `context_token`(+ 新 `client_id`)仍送达,**`context_token` 有效期约 24h、可复用**。
> **故「定时简报主动推送」(本节最初核心诉求)在 ClawBot 上可行**,前提:用户**先开口过一次**(冷启动无 token 不能凭空推),且距上次互动在 token 有效期(~24h)内——**每条入站消息刷新该用户的 `context_token`**;超期未互动则需用户再开口(或退邮件兜底)。冷推(从未开口)仍不可能。
**选型:三条路,选官方 ClawBot(详见对话调研 2026-06-23)**:
- **wechaty / hook(非官方个微)** —— 逆向/注入,违反腾讯 ToS,**封号率高**(hook >80%、web 协议被大量封),要养号/同省 IP/限速。**排除**。
- **企业微信自建应用** —— 官方、稳定;①只触达**企业微信成员**(非个人微信);②要企业**管理员**建应用 + 配可信域名;③双向对话要回调 + AES + 5s ACK,重。但**主动推送无条件**(不挑用户活跃度、不依赖灰度)→ 定时简报"必达"首选。**与 ClawBot 并列为第二渠道(本节一并设计,见下「渠道 B」),共用渠道抽象。**
- **微信 ClawBot(iLink Bot API)** —— 腾讯 2026-03-22 官方上线,跑在官方 iLink 协议 + 官方服务器 `ilinkai.weixin.qq.com`,**零封号**;腾讯定位"管道",**后端接谁都行**(可接 zcbot)。**采用**。
**为什么先实现 ClawBot(企业微信紧随)**:零管理员(用户自扫,不建应用/不配域名)→ 能立即跑通验证(协议已真机实测全通);企业微信要等管理员建应用 + 配可信域名的资源到位。企业微信随后补上,用其**无条件推送**补 ClawBot 的"24h 活跃才可推"短板。
**渠道抽象(两渠道共用,加渠道不改 scheduler / 工具主体)**:
- **绑定**:per-user 记"绑了哪些渠道 + 各自凭据/标识"(ClawBot:`bot_token`+`latest_context_token`;企业微信:`wecom_userid`,应用凭据走全局 env)。
- **统一发送**:`send_to_user(user_id, text, file?)` → 解析该用户已绑渠道 → 各渠道实现各自发;`scheduler.deliver_notify`、`WechatPushTool` 都调这层,不感知具体渠道。
- **推送即对话记录(Unified)**:`send_to_user` 投递成功后,对每个成功渠道把推送(摘要 + 文件下载链接 + agent `read` 路径 `../<rel>`)作为一条 assistant 消息写进该渠道 chat task(`ensure_channel_chat_task` 不存在自动建,与入站对话共用)。web 端渠道对话卡片可见 + agent 可基于推送追问(`read` 产物文件)。进 agent 上下文(推送是 bot 发给用户的话,记得自己发过 = 连贯,非污染);`source_task_id` 去重——调用方即目标 chat task 自己(如用户在微信里让 agent 推)时 tool 记录已在,跳过。不塞正文(避免上下文膨胀)。push 记录在 `messages.kind` 标 "push"(独立列,不进 payload),`extract_last_assistant_text`(wecom 入站取回复)加 `WHERE kind IS NULL` 跳过,避免误取 push 摘要当回复。
- **推送择优**:简报这类"必达" → 优先企业微信(无条件);ClawBot 作个人微信触达 + 聊天;两者都绑可多投或按用户偏好。
**第一期两处已定决策(评审通过)**:
- **入站对话 → 每用户一条 persistent「微信」task**(聊天要连续性;token 增长靠 §8.8 channel 长会话治理 = 软重置分段 + §8.2 context 压缩;打标签与网页 task 区分)。**两渠道入站都落到这条 task**。
- **敏感凭据入库一律加密列**(`bot_token`/`latest_context_token`;企业微信 secret 走 env 不入库)——env `ZCBOT_WECHAT_SECRET_KEY` 派生密钥;绝不进沙箱/日志/API 响应(§3.4)。
**唯一现实卡点 = 微信灰度可用性**:仅**国内个人微信**、需 **8.0.70+** 且功能灰度推送中(设置→插件),**不支持企业微信**(`bot_type=3`)。目标用户没有插件入口就用不了——落地前要先核实目标用户在灰度内。腾讯另保留**限频 / 决定可连哪些 AI / 随时终止**的权力(政策风险)。
**注册门槛 ≈ 零**:`get_bot_qrcode` **无需任何预置 app_id/凭据/审核/费用**,任何后端直接调即可生成二维码;`bot_token` 纯靠用户扫码下发。**能完全脱离 OpenClaw 自实现**协议客户端(社区 `weixin-ClawBot-API` 已证)。
**绑定模型(沿用前版已对的 per-user 扫码骨架)**:
- 每个 zcbot 用户**扫一次码** → 后端拿到**该用户专属 `bot_token`**(Bot ID `xxx@im.bot` / User ID `xxx@im.wechat`)→ 存库 → 之后按用户收发。**1 个 bot_token 对应 1 个微信账号**(扫码者)。
- 这与"每个用户连自己的微信"天然吻合,且**零管理员**(对比企业微信省掉建应用 + 可信域名)。
- ⚠️ **待核实**:`bot_token` 是 1:1(每用户一条、各自一条长轮询)还是 1:N(单 token 多用户、靠消息内 `@im.wechat` 区分,Telegram 式)。设计**按更确定的 1:1** 落,若实测为 1:N 则简化为单循环。
**扫码绑定流程(iLink)**:
1. zcbot 网页"绑定微信" → 后端 `GET get_bot_qrcode?bot_type=3``{qrcode, qrcode_img_content}`,前端展示二维码。
2. 后端 `GET get_qrcode_status?qrcode=<id>`(长轮询,单连 hold ≤35s,循环续)→ 用户用**个人微信**扫码确认 → 返回 `{status:'confirmed', bot_token, baseurl}`
3. 把当前登录 zcbot user 与返回的 `bot_token/baseurl/user_im_id` upsert 进 `channel_bindings`(channel='clawbot')。前端轮询自己的绑定状态翻转。
**数据模型(统一表 `channel_bindings`,判别列 + JSONB 多态;0015 由旧 `wechat_bot_bindings`/`wecom_bindings` 合并而来)**:
`user_id, channel, status, config(JSONB), created_at, updated_at`,PK=(user_id, channel)。沿用本库 `usage_events`(kind+units)范式 —— 各渠道字段装 `config`,加渠道不动 schema。
- channel='clawbot' 的 config:`{bot_token*, bot_im_id, user_im_id, base_url, latest_context_token*, context_token_at(iso), chat_task_id}`(`*`=经 crypto 加密入 JSONB;`latest_context_token`+`context_token_at` 判 24h 推送窗口)。
- channel='wecom' 的 config:`{wecom_userid}`(企业成员 id,非密钥、明文)。
- 敏感字段加密 + **绝不进沙箱 / 不落日志 / API**(§3.4);`chat_task_id` FK 与 per-字段 NOT NULL 退应用层校验(与 usage_events JSONB 同向取舍)。
> **为何统一表(2026-06-24 重构,§设计取舍)**:渠道绑定 = "用户在某渠道的一份配置",各渠道字段形态不同 → 用判别列 + JSONB(同 usage_events)最契合本库,且渠道增长(飞书/TG…)零 migration。分表(每渠道一表)对 2 渠道够用但不扛增长、与库内多态范式不一致;单宽表(NULL 列并列)2 列 vs 8 列硬并、稀疏 + 破坏 NOT NULL,最差。趁绑定数据极少时合表(migration 0015 搬数据,DDL 同事务失败回滚不丢)。
**协议要点(自实现客户端,2026-06-23 实测验证)**:base = 绑定返回的 `base_url`(实测 `https://ilinkai.weixin.qq.com`)。所有请求 header:`Content-Type: application/json` + `AuthorizationType: ilink_bot_token` + **`X-WECHAT-UIN` 每请求变**(`base64(随机uint32)`,反重放);除取码/查状态外加 `Authorization: Bearer <bot_token>`
- **取码/绑定**:`GET /ilink/bot/get_bot_qrcode?bot_type=3`(无需任何预置凭据)→ `{qrcode, qrcode_img_content}`,`qrcode_img_content` 是**微信深链**(`liteapp.weixin.qq.com/q/...`),需**自渲成二维码**(非图片直链);`GET /ilink/bot/get_qrcode_status?qrcode=`(长轮询)→ `{status: wait|confirmed|expired, bot_token, baseurl}`。二维码 TTL 短(~1min),实现要**过期自动换码**。
- **收**:`POST /ilink/bot/getupdates`,body `{get_updates_buf:<游标,首次空>, base_info:{channel_version:"1.0.2"}}`(长轮询 hold ≤35s)→ `{msgs:[{from_user_id, context_token, item_list:[{type:1,text_item:{text}}]}], get_updates_buf}`
- **收图片/文件(2026-06-24)**:`item_list` 项除 `text_item` 外还有 `image_item`(type=2,带 `media{encrypt_query_param, aes_key, encrypt_type}` + 优先 `aeskey` 32-hex)、`file_item`(type=4,带 `media` + `file_name` + `len`);**下载是文件发送(下条)的逆操作**——`GET {cdn_base}/download?encrypted_query_param=<media.encrypt_query_param>` 取密文 → **AES-128-ECB+PKCS7 解密**(key 优先图片 `aeskey`,否则 `media.aes_key` 两种编码兜底:base64(raw16) / base64(hex32))。落盘 `<wd>/inbound/`,图片拼 `[用户上传的参考图]`(走 `look_at_image`)、文件拼 `[用户上传的文件]`(走 Read/Shell)注入 user 消息,**复用 web 端粘贴图约定,不碰模型链路**。⚠️ 下载 GET/POST 与 aes_key 取支待真机端到端校(crypto 单测已过)。
- **发**:`POST /ilink/bot/sendmessage`,body `{msg:{to_user_id, client_id:<每条唯一>, message_type:2, message_state:1|2, context_token, item_list:[...]}, base_info:{channel_version:"1.0.2"}}`。**`client_id` 必带且每条唯一**(否则同 token 后续消息被丢);多条/长文 → 中间块 `message_state=1`、末块 `=2`,~1000 字/块、间隔 ~300ms。成功返回 HTTP 200 + 空 body `{}`(无 ret,不能据 body 判成败,以实投为准)。
- **token 生命周期**:`context_token` 有效期 ~24h、可复用(发完 FINISH 仍可再发)→ 主动推送靠它;**每条入站消息刷新**该用户 token(存最新值 + 时间戳)。`bot_token` 长期 per-user 凭据(扫码下发)。
- **文件发送(2026-06-23 实测通,`scripts/probe_clawbot_file.py`)**:①`POST /ilink/bot/getuploadurl`(body `{filekey:随机16B的hex, media_type:3(FILE)/1(IMAGE), to_user_id, rawsize, rawfilemd5, filesize:PKCS7填充后大小, aeskey:随机16B的hex, no_need_thumb:true, base_info}`)→ 返回 `{upload_param}`;② 本地用该 aeskey 做 **AES-128-ECB + PKCS7** 加密文件;③ `POST {cdn_base}/upload?encrypted_query_param=<urlenc(upload_param)>&filekey=<urlenc(filekey)>`(`cdn_base=https://novac2c.cdn.weixin.qq.com/c2c`,body=密文、`application/octet-stream`)→ **响应头 `x-encrypted-param`** = 下载引用(漏 `&filekey=` 会 400 `filekey mismatch`);④ `sendmessage``item_list:[{type:4, file_item:{media:{encrypt_query_param:<上一步 x-encrypted-param>, aes_key:base64(aeskey.hex()的ascii字节), encrypt_type:1}, file_name, len:str(rawsize)}}]`。**docx/pdf 简报可原生直推为可打开附件**,无须退下载链接。
- ⚠️ **仍待核实**:富文本(markdown)渲染支持度(源码有 `markdown-filter.ts`,暂按纯文本正文 + 文件直推设计);限频数值(腾讯保留限速);媒体大小上限(暂沿用 20MB)。
**架构:入站与出站一体(第一期一起做)** —— **主动推送依赖 `context_token`,而 token 只能从入站消息拿**,故"只出站不入站"不成立;getupdates 长轮询既收对话、又负责刷新 token。
- **入站长轮询管理器**(lifespan 起,仿 §8.4 `_disk_scanner` plain-asyncio):每个 active binding 一条 `getupdates`(hold ≤35s 循环续)。收到消息 → 按 `bot_token`→binding→zcbot `user_id` 定位是谁 → **刷新该 binding 的 `latest_context_token` + 时间戳** → 映射到该用户的微信对话 task(默认一条 persistent「微信」task 保连续性,§8.5 会话模式)→ 复用 `_run_agent_bg` 跑 → 结果按 ~1000 字分块 `sendmessage`(每块新 `client_id`、中间 `state=1``state=2`)带 `context_token` 回。**无 5s ACK 约束**,长 run 天然 OK——相对企业微信回调的根本简化。
- **出站主动推送**(scheduler 简报 / 任务结果 / `WechatPushTool`):用库里该用户 `latest_context_token`,**距上次入站 <~24h** 则直接 `sendmessage`(文本 + docx/pdf 文件直推);**超期 / 从未开口** → 推不出,退邮件兜底(§8.5)或挂起待用户下次开口刷新 token。即"用户开口过、且近 24h 活跃 → 可主动推"。
- **scale**:N 个 active binding = N 条长轮询;公测期 N 小可接受;放大时视 1:1/1:N 实测结果改为单循环轮询多 token。
- **web↔微信同步不对称 → web 端只读镜像(2026-06-24 取舍)**:这条 persistent「微信」task 是 web 与微信共享的同一条 DB 消息流,但写入方向不对称——**微信→web 同步**(入站经 `_poll_binding` 落库,web 打开即见),**web→微信不同步**(web 端发消息走通用 `/v1/tasks/{id}/messages`→`_run_agent_bg`,不经过 inbound loop 里 `send_text` 回微信那段,微信侧零感知)。**不做双向打通**:回微信需 `context_token`、只能从入站拿且 24h 过期,双向同步会被该窗口拖成"有时同步"(不可预测)+ 两入口并发写同一上下文歧义。改为 web 端对 channel=wechat 的 task **只读镜像**(`applyChannelComposerLock` 置 readOnly + 引导去微信),交互权威单一锚定微信;主控台想主动往微信推 → 走 `WechatPushTool`/定时简报(出站语义,非对话)。
**接入面(复用现有范式)**:
1. `tools/wechat_bot.py`:ClawBot 客户端(`get_bot_qrcode/get_qrcode_status/getupdates/sendmessage` + AES 媒体)+ `wechat_bot_enabled()`(开关在才挂工具,沿用 §3.4)+ `resolve_wechat_target(user_id)`→`bot_token` + `WechatPushTool`(agent 可调,按当前 run 的 user_id 解析)。HTTP 走已有 httpx。
2. `core/scheduler.py` `deliver_notify``channel=="wechat"` 分支,与 email 并列 → 定时简报**把最新产物文件直推**本人微信(取 `_newest_artifact`,≤上限 `sendmessage` 文件、超限退"点此下载"链接;**不改 job schema**——通道是 notify 字段的值)。
3. `web/app.py`:`POST /v1/wechat/bind/qrcode`(起二维码)、`GET /v1/wechat/bind/status`(轮询绑定结果)、`DELETE /v1/wechat/bind`(解绑)、`POST /v1/wechat/test`(自检发一条);**lifespan 起入站长轮询管理器**(见上"架构");前端设置加"绑定微信"扫码 UI。
**渠道 B:企业微信自建应用(✅ 2026-06-24 推送;✅ 2026-06-25 入站对话,共用渠道抽象)**
- **决策演进:出站推送先行,入站对话后补(2026-06-25)**。最初(2026-06-24)刻意只做推送以简化("和邮件一个量级"),其无条件主动推正补 ClawBot 24h 窗口短板;公测中需求明确企业微信也要能直接对话 → 补入站。**入站方式与 ClawBot 本质不同**:ClawBot 走长轮询(`getupdates` + 常驻 `run_inbound_manager`),企业微信走**回调 webhook**(企微服务器主动 POST 加密 XML)→ **无需后台轮询 task**,只加 HTTP 端点。agent 跑 >5s 超被动同步(5s 返回密文 XML)窗口 → 回复走 `message/send` 主动推回(复用 `push_wecom`),被动回复回 `success` 防重试。**对话核心与个人微信共用** `_run_channel_conversation(channel)`(建/复用会话 task → run 锁 → `_run_agent_bg` → 取回复),两渠道**各一张会话 task**(企微 binding 也存 `chat_task_id`)。
- 入站组件:`core/wechat/wecom_crypto.py`(WXBizMsgCrypt 等价:SHA1 验签 + AES-256-CBC 解密 + receiveid/corpid 校验;与 `crypto.py` Fernet 列加密、`wecom.py` 出站 API 全无关);`service.get_user_by_wecom_userid`(回调反查身份)+ `get/set_wecom_chat_task`;`GET/POST /v1/wecom/callback`(无 JWT,身份从加密 XML `FromUserName` 反查)。env:`WECOM_CALLBACK_TOKEN` / `WECOM_CALLBACK_AESKEY`。**暂只收文本**(图片/语音/文件回 success,后续走 `media/get` 补);未绑定/空消息静默。
- **应用凭据(全局 env,需管理员建应用)**:`WECOM_CORPID / WECOM_AGENTID / WECOM_SECRET`;secret 仅 host 进程读、不进沙箱(同 ClawBot / `send_email`)。host 直连 `qyapi.weixin.qq.com`(`core/wechat/wecom.py`)。
- **绑定两路(touser=wecom_userid)**:
- **手填 userid(无 HTTPS 域名时,默认)**:`PUT /v1/wecom/bind/userid` 直接写绑定;userid 见管理后台→通讯录→成员→「账号」。**推送是出站调用、不需域名**,故没域名也能用企业微信推送 —— 仅 OAuth 那路要域名。
- **扫码绑定(OAuth,需 HTTPS 可信域名)**:rail modal「扫码绑定」→ `oauth2/authorize?...scope=snsapi_base&state=<HMAC签+短TTL>` → 扫码/静默 → 回调 `GET /v1/wecom/oauth/callback`(公开端点,身份从 state 验,非 JWT)→ `cgi-bin/auth/getuserinfo?code=``wecom_userid`。**需管理员配「网页授权可信域名」** + `ZCBOT_PUBLIC_BASE_URL`
- **推送**:`gettoken` → `access_token`(2h 缓存 + 提前刷新 + 线程安全锁 + 40014/42001 失效重取)→ `message/send` text/file(file 先 `media/upload?type=file``media_id`,≤20MB)。
- **数据**:统一进 `channel_bindings`(channel='wecom',config=`{wecom_userid}`,明文非密钥);最初 0014 单建 `wecom_bindings`,0015 合进统一表(见上数据模型)。多企业留 `corpid/permanent_code` 进同一 config(additive,YAGNI)。
- **接入**:`service.push_wecom` + `send_to_user` 加 wecom 一路(已绑则推);scheduler `deliver_notify``wechat` 通道经 `send_to_user` 自动带上企业微信。端点 `/v1/wecom/oauth/url|callback`、`/v1/wecom/bind` GET/DELETE、`/v1/wecom/bind/userid` PUT(手填)、`/v1/wecom/test`;前端 rail modal 企业微信段(扫码 + 手填两路)。
- **触达**:仅企业成员;**品牌可自定义**(应用名/头像,区别于 ClawBot 统一名)。
**取舍(不选)**:
- **不用 wechaty/hook**:违规 + 高封号 + 养号运维,机构产品不可接受。
- **第一期不锁企业微信**:企业微信触达面窄(仅成员)、要管理员、双向重;ClawBot 触达个人微信 + 零管理员 + 双向轻。企业微信留作"机构身份 / 不依赖灰度"的后续备选,与本通道正交、绑定表/推送抽象可平行扩。
- **bot_token 落库但隔离**:它是长期 per-user 凭据,必须持久化(不同于企业微信 2h `access_token` 可纯内存);安全靠加密列 + 不进沙箱,不靠不落库。
- **富排版不强求卡片**:个微富文本能力存疑,统一走"正文纯文本 + 产物文件直推",规避平台差异。
**改动面(第一期,含入站+出站)**:1 张新表 + migration `0012_wechat_bot_bindings`;`tools/wechat_bot.py`(iLink 客户端 + `WechatPushTool` + 绑定/token 服务);**1 个 lifespan 入站长轮询管理器 + 消息→user/task 映射**(复用 `_run_agent_bg`);`core/scheduler.py` `deliver_notify``wechat` 分支;`web/app.py` 4 端点 + 前端扫码 UI;agent_builder 注册(开关在才挂)。env:`ZCBOT_WECHAT_BOT_ENABLED`(+ 可选 `ZCBOT_WECHAT_BASE_URL` 覆盖)+ `ZCBOT_WECHAT_SECRET_KEY`(凭据加密)——**无全局 app secret**(凭据是 per-user `bot_token`,扫码下发)。**不动** loop/llm/capabilities/现有 schema。
**渠道 B(企业微信,紧随)改动面**:env `WECOM_CORPID/AGENTID/SECRET`;`tools/wecom_push.py`(access_token 缓存 + `message/send` + `media/upload` + 渠道实现);`send_to_user` / `deliver_notify` 接 wecom 渠道;绑定抽象加 wecom 侧 + migration `0013`;OAuth 起始/回调 2 端点 + 前端"绑定企业微信"。**两渠道共用 `send_to_user` 抽象与绑定层**,故渠道 B 主要是"多一个渠道实现 + 一种绑定方式",不重写主体。
### 8.8 channel 长会话上下文治理(2026-06-29,Phase 1 ✅ 落地 / Phase 2-3 design)
**根因**:微信/企业微信入站对话复用**同一条常驻 chat task**(§8.7,per-user-per-channel 一条,要连续性),`Session.load()` 全量装回每轮 LLM 调用。web 任务"做完即止"故有天然边界,IM 是"用户当常驻助手永远在聊"→ 这条 task 只增不减,越用越贵/慢,终撞 context window。§8.2 的压缩只摘旧 tool 正文、门槛高(可靠上下文 50%)、从不删消息,挡不住 IM 这种无限累积。
**业界对照(2026-06-29 调研:OpenClaw / Hermes(NousResearch)/ Claude Code)**:三家都是"阈值触发摘要 + 头尾保护 + 旧 tool 输出先剪枝"。Hermes 最清晰:双阈值(agent 内 50% + gateway 85% 兜底)+ 四阶段(剪枝→边界检测 protect 头3+尾N→结构化摘要中段→重组保 tool 配对),摘要**增量更新**且保留 file path/ID/数值原文(mem0 实测:摘要会静默丢精确值/硬约束/决策理由)。OpenClaw/Hermes 另配持久记忆层(sqlite-vec / FTS5 + 跨会话)。**但三家都是单次 coding session,不解"IM 用三个月"的跨时段累积** —— 那是 IM 独有、最高杠杆且零信息损失的「会话分段」,本库自补(Phase 1)。
**心智:边界而非删除**。沿用 §8.2「禁止把『只保留最近 N 条』当主策略」「保留可追溯原文」——本设计**一条消息都不删**,只移动"喂给模型的窗口起点",全历史留 DB、web `/messages` 不 gate 照旧翻完整记录。
**Phase 1(✅ 2026-06-29):context_base_idx 软重置**
- `tasks.context_base_idx`(migration 0019,NOT NULL DEFAULT 0,additive)= 喂给模型的窗口起点。`Session.load()` 只装 `idx >= base` 的消息进 LLM 上下文。
- **关键不变量**:`_db_idx`(append 续号锚点)取 messages **真实总条数**而非加载条数 —— 否则下次 append 复用已存在 idx,撞 `uq_messages_task_idx`/覆盖历史。
- 两个触发口(`core/wechat/service.py`,仅入站走、push 不触发):
- **自动 gap 分段**(`maybe_gap_reset`):入站时距上次消息超 `config.json` `channel.session_gap_hours`(默 6h,`<=0` 关闭)→ 软重置,`base = 最后一条 user 消息 idx`。**不是失忆墙**:新窗口仍带"上一轮"原文做续聊锚点(用户"接着刚才说"接得上),零额外 LLM 调用、零延迟。
- **手动新话题**(`reset_channel_context(hard=True)`):用户发「新话题/新会话/`/new`/清空上下文」→ `base = 总数`,彻底从零(回执提示已归档)。
- 二者本质同一操作(推进 base)的被动/主动两口:被动断开要续上(软)、主动换题要干净(硬)。
- `clear_messages`(web 端清空)全删消息后 `base` 归 0(idx 从 0 重起,否则窗口起点悬空)。存量 task / web 普通任务 base 恒 0 = 喂全量,行为不变(对外契约友好)。
- **不选「每次 gap 开新 chat_task_id」**:会堆 `wechat-xxx-2/-3…` 文件夹(`working_dir_from_name` slug 写死)+ web 一堆 task 卡片;软重置零新文件夹/零新 task。**不选「kind='boundary' 标记消息」**:要混进消息流处理 tool 配对 + "别喂模型",列是纯元数据零侵入。
**Phase 2(design):阈值结构化摘要(补全 Hermes 阶段③)**。现 `core/context.py` 只做剪枝(旧 tool 截 2000 字)+ 尾部保护,缺"中段轮做 LLM 结构化摘要"。补:到门槛时把「base 之后、头 N 条之后、最近 keep_recent 之前」压成固定模板(目标/约束偏好/进展/待办),增量更新而非重写,保留 path/ID/数值原文。门槛接 Hermes 双层(50% + 85% 兜底,`_COMPACT_CONTEXT_RATIO`)。工程坑(mem0 列):辅助模型返非 JSON 降级回原文、tool 配对别被切断(复用 `_repair_dangling_tool_calls`)。**A(分段)砍跨话题累积,B(摘要)兜单段超长,两者正交**。
**Phase 3(design):持久检索(解"问很久以前的精确内容")**。软边界拿"跨边界精确回忆"换成本——梗概不够时(问上个月让查的具体数据),上 OpenClaw sqlite-vec / Hermes FTS5:新消息进来先语义/全文检索本 task 历史,命中原文注入当前窗口。工程最重,待 Phase 1/2 跑稳、确认确有此类需求再做(数据没删,随时能补)。
**落地次序**:Phase 1 上线观察 token 曲线 → 再定 Phase 2 门槛/是否做 → Phase 3 视真实"长期精确回忆"需求。
--- ---
## 附录:DeepSeek V4 关键事实(2026-04-24) ## 附录:DeepSeek V4 关键事实(2026-04-24)

View File

@ -2,7 +2,7 @@
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9` > 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`
最后更新:2026-06-18(brief 简报重定位为「重要文献速览」+ 精简到三文件 + bump 0.20.0) 最后更新:2026-07-03(web 进度 dock 展开遮挡最新内容:贴底时补触底,bump 0.38.1)
--- ---
@ -21,6 +21,386 @@
## 已完成关键能力 ## 已完成关键能力
### 2026-07-03 / web 列表状态灯挪到文件夹行左侧,数据行均匀分布(bump 0.38.8)
用户建议:状态放文件夹名左侧、时间那行正常分布。落地:终态徽章 + 运行圆点挪进文件夹行行首(`● 📁 ppt4`,行首左上区最先被扫到;无文件夹行的 task 回落到数据行行首,`syncTaskRowRunIndicator` 按同规则找 host:`.wd-line` 优先、`.meta.stats` 兜底);底部数据行只剩纯数据(skill/条/tok/时间),改 `justify-content:space-between` 均匀铺开,时间自然落行尾。改 `web/static/js/chat.js` + `web/static/dev.html`
### 2026-07-03 / web 列表 meta 行数字组改靠左跟排——修 active 静默后的左侧"缺口"(bump 0.38.7)
用户发截图:0.38.6 active 徽章静默后,无 skill 的行(列表主体)meta 行左槽空了,数字组(条/tok)又被 `.num.right-group{margin-left:auto}` 整组挤右,中间留出一块像缺了东西。修:数字组改靠左跟排填上左槽,只有 time-ago 锚行尾(`margin-left:auto` 移到 time-ago);模板删掉已无意义的 `right-group` class。"条/tok"跨行对齐由原有 min-width+右对齐槽位保持。改 `web/static/dev.html` + `web/static/js/chat.js`
### 2026-07-03 / web status 徽章改"默认态静默"——active 不挂徽章,终态行淡化(bump 0.38.6)
运行圆点落地后暴露 status 徽章两问题:「进行中」(生命周期 active)与「运行中」(run_status)语义撞车;列表主体都是 active,每行重复挂蓝徽章是零信息噪音、还占 meta 行首槽。设计原则定为**默认态静默、例外态着色、瞬时态用动效**:active 不再渲染徽章(列表行 + 中栏 chat-meta 同规则,chat-meta 终态徽章保留兼解释"输入框为什么消失");completed/abandoned 徽章保留且整行淡化(`st-*` class,opacity .68,hover 恢复——st- 前缀防撞选中态 .task-row.active);绿脉冲点成为唯一动效信号,与生命周期解耦。筛选下拉「进行中」文案不动(筛选语境无歧义)。顺手删掉不再被渲染的 `.badge.active` CSS。改 `web/static/js/chat.js` + `web/static/dev.html`
### 2026-07-03 / web 运行态标识精简为纯脉冲圆点(bump 0.38.5)
用户反馈「运行中」等文字让列表 meta 行太拥挤。标识收成一个 7px 带色脉冲圆点(绿=运行中/橙=停止中/红=出错),文案全部移进 hover title(error 仍带 run_error 详情);圆点在 baseline 对齐的 meta 行里补 `align-self:center`。改 `web/static/js/chat.js` + `web/static/dev.html`
### 2026-07-03 / web 后台 running task 自动挂 SSE——运行态标识刷新页面后也实时(bump 0.38.4)
0.38.3 留的边界:刷新页面(liveRuns 清空)或 run 由别的标签页/渠道启动时,列表标识只是服务端快照,run 跑完没人通知前端,会一直挂「运行中」。用户点出方向:别轮询,直接复用 SSE。改法:`loadTaskList` 收尾新增 `subscribeRunningRows`——列表带出的 running/cancelling 行,本地未订阅的自动 `ensureRunningTaskSubscribed` 挂上事件流(上限 4 条后台流,防 HTTP/1.1 同源连接数被占满;超限行标识仍显示只是不自动清),done/error 走 fetchSse 现有收尾(清 liveRuns + 就地清标识 + 重拉列表),全程实时零轮询。配套两处:`ensureRunningTaskSubscribed` 的 cancelling/workingDir 从"读全局 state.taskMeta"改为调用方传 seed(taskMeta 或列表行)——后台 task 的媒体产物 rel 解析必须用各自 working_dir;`renderLiveRunIfVisible` 只在订阅的是选中 task 时才调(后台订阅不碰对话区,否则重挂卡 + 强制滚底误伤正看着的对话)。附带收益:刷新后切进 running task,直播卡带着后台累计的文字直接可见(renderMessages 收尾 renderLiveRunIfVisible 挂卡)。只改 `web/static/js/chat.js`
### 2026-07-03 / web 任务列表加运行态标识(bump 0.38.3)
用户报:多个 task 并发执行(调用工具/回复中)时,左栏任务列表看不出哪些在跑。后端 `/v1/tasks` 每行其实早已带 `run_status`(`_task_dict` 统一出),只是前端 `renderTaskList` 没用——`chat.js` 里"列表行摘要无此字段"的注释已过时。修:列表行状态徽章旁新增运行态标识,`running` 绿脉冲点「运行中」、`cancelling` 橙「停止中」、`error` 红点「出错」(hover 出 run_error),`idle` 不显示;取值 = 服务端 run_status 快照 + 本地 `state.liveRuns` 叠加(本会话刚发出的 run 比列表快照新,cancelling 本地标志优先)。实时性三时机:run 开始(sendMessage / ensureRunningTaskSubscribed)与点停止时 `syncTaskRowRunIndicator` 就地 patch 对应行 DOM(不重拉列表,保住滚动加载的分页);run 结束沿用 fetchSse 收尾已有的 `loadTaskList()` 重拉。别处启动的 run(其他标签页/渠道)靠列表任意一次重拉带出,首版不加轮询。顺手把 ⋯ 菜单「清空对话」的 running 判断改走同一 `taskRunState`(列表行此前恒 false)。改 `web/static/js/chat.js` + `web/static/dev.html`(CSS)。
### 2026-07-03 / ppt 模板 zongyuan_red 逆向重建为真实 中国建材总院 身份(bump 0.38.2)
用户给官方 `总院模板.pptx`(中国建筑材料科学研究总院有限公司)要求"统一按这个来,zongyuan_red"。原 `layouts/zongyuan_red/` 是手搓的红条结构版(深蓝 #1F2A44 + 顶部红条 + 55/45 封面 + PART 章节),与真实文件 DNA 完全不符。PowerPoint COM 渲出 3 档真页(封面/内容/尾页)+ 解 pptx 抽实测:主红 `#D7000E`、目录红 `#D52C24`、近黑 `#181717`、辅灰 `#6F6F6F`/`#BCBDBD`;字体 微软雅黑 + Arial + 方正兰亭黑;八边形品牌 logo(EMF→PNG 透明底)+ 总部大楼灰度实景 + 材料马赛克实景(TIFF→压缩 JPG)。重写 5 页 SVG 忠实还原:封面(实景铺底+顶左 logo&机构全称+居中主红块+白标题)/目录(左上实景+右下大红斜三角+目录标题+白字方块序号,承集团规范斜向分割)/章节(八边形品牌水印+红 PART 胶囊+大标题,原件缺、按八边形 DNA 合成)/内容(左缘红方块+标题+灰分隔线+右上 logo+4 列灰底红顶条卡片+底部红条+页码)/尾页(材料马赛克+"材料创造美好世界"红+Thanks)。打包 logo.png/cover_bg.jpg/ending_bg.jpg 三资产,改写 design_spec.md 反映真实身份,补登记进 layouts_index.json(此前 dir 在但未注册)。质检 --template-mode 5 页零 error;finalize 内嵌 8 图 + svg_preview 全量渲图逐页过目确认与原件一致。**并加主动提示**:strategist.md §e + SKILL.md 默认主题段各补一条 —— 受众/素材/用户机构指向 中国建材总院·CNBM 系(汇报/立项/评审/职称评审/品牌宣讲)时,策略阶段**主动**把 `zongyuan_red` 整套模板作为候选点名给用户(区别于 business-red 仅配色预设),用户点头再按明确路径套入;这是唯一鼓励主动提模板的场景,其余仍等明确路径,不模糊匹配。
### 2026-07-03 / web 进度 dock 展开遮挡最新内容(贴底时补触底,bump 0.38.1)
用户报:对话「拉到底部但仍有内容被遮挡看不到」。根因:`#task-progress-dock` 是 `#chat-stream` 上方的 flex 兄弟(`flex-shrink:0`),dock 一展开/长高,`chat-stream` 可视高度就被从顶部挤掉那么多——`scrollTop` 据置不变,原本贴底的内容被推到视口折线以下看不见。而 `chat.js` 直播态 `task_progress` 事件在重渲 dock(=长高)后**早 return,跳过了末尾第 1684 行的贴底兜底**,所以底部不会自动回滚。修:在 `task_progress` 分支 `setTaskProgress` 后补一句 `if (nearBottom) stream.scrollTop = stream.scrollHeight`(与其余事件分支同款贴底逻辑),dock 涨高时把最新内容重新钉到底。只动 `web/static/js/chat.js` 直播路径一处,历史渲染/其他事件不受影响。
### 2026-07-03 / ppt 反纯文字页+图表落地硬门(7aa49195 二代陶瓷 deck 复盘,bump 0.38.0)
0.37 网格锁上线后同题重做(task 7aa49195),对齐/标题/节奏大幅好转,但用户复评两点成立:①**两栏裸文字页 ×4**(S8/S9/S16/S21 同为"图标小标题+下划线+文字堆 ×2 栏"零图形)——该形态无卡片、仅 2 图标,0.37 的 icon-grid/card-grid 指纹完全看不见,单调门盲区;②**全本零数据图表**(素材全是数字:100万→500万条/能耗降10-20%/碳排26%),"历程"类内容也退化成文字列表。另有两硬缺陷:S18 第 5 条描述被页脚裁掉(内容超出内容区)、S19 红色大字直接叠压灰色说明文字。修:**A 指纹加 text-columns 原型**(0 卡片+≤3 图标+≤2 图形基元+左对齐文本聚 ≥2 列)堵盲区,4 页同指纹→error;**B spec 指派图表落空检测**——spec_lock page_charts 指派了图表但该页 <3 图形基元且 <4 卡片error("图表被退化成文字"), executor 硬规则"不许把指派图表降级为文字/大字 KPI";**C CJK 叠压升级 error** run 70% CJK(表意字宽 1.0em 估宽近精确)且互叠 50%error(其余情形保持 warning+渲图过目);**D layout_grid 加可选 content_bottom**非页脚文本 baseline 越过它error(S18 ),executor "写页前垂直空间预算"纪律;**E 策略层数据图表下限**素材含 3 组可比数值全本至少 1-2 页真数据图表,零图表需在 spec 写理由;两栏裸文字列表计入"原型 2 "上限测试 +9(30 )全过,全量 162 ;71 charts 模板 + 中汽研 deck 模板回归零新增噪音已知边界:S19 类叠压若文字带 rotate/scale transform 仍不可测(子树跳过);数据图表下限是策略纪律,机器只能验"指派了没画",验不了"该指派没指派"
### 2026-07-03 / web 直播流式文字按轮次分段(修工具刷屏时文字被推出视口,bump 0.37.2)
用户报:web 端一次 run 里工具调用多时,助手文字流式输出「一直在上方」被工具卡越推越高滚出视口,看不到。根因:直播态把整次 run(含几十轮 LLM)全塞进**一张 assistant 卡**——文字全累进顶部单块 `.body`(`ctx.acc` 反复重渲),工具 `tool_call`/`tool_result` 全 `appendChild` 到其下方;而历史态(DB reload)是**每轮 LLM 一条独立 assistant 消息**、天然按轮次穿插。两态结构不一致就是病根。修(方案 A,只动 `chat.js` live-run 路径,历史渲染不动):文字按轮次分段——`ensureTextSeg`/`closeTextSeg` 维护「当前打开的文字段」,每个可见工具/选项卡(非隐形 `task_progress`)先 `closeTextSeg` 关掉当前段(空占位段直接移除避免留「思考中」孤块、有内容段定稿去光标+高亮),之后的新文字在卡片底部另起新段。效果=`文字(轮1)→工具→结果→文字(轮2)→…`,流式文字始终在底部可见,且与历史结构一致(run 结束 reload 无跳变)。rAF 节流改为闭包捕获 seg,防工具关段后错渲。删掉 `ctx.body`/`ctx.pending` 单块模型,改 `ctx.curSeg={el,acc,pending}`;`createLiveAssistantCard`/`renderLiveRunIfVisible`/`sendMessage`/`fetchSse` 收尾同步改。
### 2026-07-03 / seedream size 面积钳制(修 1920x1080 被 ARK 400 打回,bump 0.37.1)
模型自选 16:9 出图(如 `1920x1080`=2,073,600px)触发 ARK 硬门 `image size must be at least 3686400 pixels`(=1920²),整次文生图直接 400 失败。根因:`tools/seedream.py` 把 `size` 原样透传,不校验 ARK 的**面积**约束(卡的是总像素不是单边,故 16:9 最小合规是 2560x1440)。修:tool 内新增 `_normalize_size()`,拿到 `chosen_size` 前先钳进 `[min_pixels, max_pixels]`——面积 `<min``sqrt(min/area)` 等比放大、两边向上取整到 8 的倍数并复核达标(1920x1080→2560x1440);`>max`(3072²=9,437,184)等比缩小;已合规原样透传(向后兼容)。约束值加到 `config/media/doubao.yaml` seedream_5 档(`min_pixels`/`max_pixels`,旧 yaml 缺键则视为不设该侧、行为不变)。归一化时返回串附 `[note]` 提示 + meta 记 `requested_size`,usage 记账按**真实出图尺寸**。选自动钳而非返错让模型重试:省一轮往返、避免二次错。新增 tests 手验 9 例全落合法区间。
### 2026-07-03 / ppt 对齐网格锁 + 错位/单调质检(d1285247 陶瓷 deck 复盘,bump 0.37.0)
对 d1285247 产物(25 页陶瓷方案 PPTX)逐页几何量测 + PowerPoint COM 渲图目视复盘,三类缺陷:①跨页左基线漂移(0.6560.75in 七个值)+ 并排块顶差 212px 的"想对齐没对齐"(S8/S19/S23);②5 页同为"图标+标题+三行字"卡网格,零流程箭头/零分层图形,单调;③标题语义不兑现("五层架构"画成五条等宽横条、"矩阵"画成卡片格)。根因:executor 手写绝对坐标但 spec_lock 无网格常量可依;质检只查重叠/越界不查对齐;"节奏不雷同"只约束相邻页。修四层:**A spec_lock 新增 `layout_grid` 锁段**(margin_x/content_top/footer_y/gutter,strategist 派生、executor 每页吸附、checker 强制;design_spec_reference §V 同步);**B executor-base §3 网格对齐纪律**(并排卡片同 top 同高等 gutter、打破网格 ≥16px 干净打破、同行文字 ≥0.3em 禁贴字);**C svg_quality_checker 新增 check 14**——兄弟卡片近失对齐(精确几何,212px error;底对齐/中心对齐/绘图区内数据柱三类豁免,71 charts 模板回归误报清零)、layout_grid 偏离 215px error、行内 gap 不等 warning、无锁存量项目跨页左缘聚类漂移 warning、版式指纹单调门(≥3 页同指纹 warn、≥4 或过半 error;仅对 NN_ 编号 deck 页聚合,模板库静默);**D 策略纪律升级**——同一版式原型整本 ≤2 次 + 标题语义必须被图形兑现(SKILL.md 大纲纪律 + strategist visual-floor GATE)。顺手修 comparison_columns 模板胶囊 5px 错位。新增 tests/test_svg_alignment_check.py 21 项,全量 153 过。已知边界:页面平衡类(底部大空白/重心偏移,S18/S22)误报风险高未进 checker,只进阶段五验收 checklist 眼看;错位 error 会被导出边界自动质检门连带拦截,存量项目重导出若报新 error 属预期(真缺陷)。
### 2026-07-03 / 进度条自愈:回放层强制单调完成(d1285247 复盘,bump 0.36.2)
用户报 task d1285247(ppt生成3)进度条反常:后面步(质检/导出)打绿勾、前面步(摄取素材/配图)却卡红圈"…",顶部"4/6"。诊断脚本 `scripts/diag_progress_d1285247.py` 拉出 `task_progress` 调用序列定位**非渲染 bug**——`progress.js` 忠实回放了模型发的调用:模型每次推进是"标下一步 completed + 再下一步 in_progress"的跳步,**每次都漏给上一次留在 in_progress 的那步补 completed**(s1、s3 被漏),回放到最后就是 `s1=in_progress,s2=completed,s3=in_progress,s4/s5/s6=completed`。根因是模型用工具收尾不稳,纯提示拦不住(与门体系教训同构)。修在**回放层加确定性单调不变量**:`enforceMonotonicProgress`——checklist 线性推进,只要某步 completed,其之前所有步自动视为 completed;`applyProgressAction` 的 set_plan / update_step 两条出口都过一遍,漏发自愈。前端单测加 3 条(含复刻 d1285247 跳步序列 → 6/6)。已知边界:假设步骤线性顺序(现有所有 skill 成立);若将来出现真·并行/乱序 checklist 会被抹平。
### 2026-07-03 / ppt 门体系二轮硬化:逃生口收紧 + 导出自动质检 + svg_final 嵌图修复(139a59c5 重跑复盘,bump 0.36.1)
0.36.0 上线后同 task 重跑(仍 deepseek-v4-flash):产物整体大幅好转,但仍有 4/25 页错位(P12 色带裁两行标题+正文跑出卡外 / P14·P18 文字骑卡片边框 / P21 手画饼图弧线劈叉)。轨迹显示**两道新门都触发了、都被模型 8 秒内用逃生口按过去**:质检+渲图验收 0 调用,`--allow-iconless` + `--allow-unreviewed` 连按直接导出——门有了,逃生口对弱模型等于"报错时该加的参数"。且 `--allow-iconless` 的"正当理由"是我们自己给的:wrapper docstring 老示例教它 `-s final`,而图标门检查的是 svg_final(data-icon 已展开)→ 误报零图标;`-s final` 还连锁出图片路径连环坑(见 F)。二轮修五处:**A 验收门分层**——"从没渲过/渲后又改/finalize 前渲的"为硬问题,**任何 CLI flag 不豁免**(渲图便宜且机器可验,没理由交付没人能看过的页);`--allow-unreviewed` 只豁免"渲过但没标 pass";运维兜底走 `ZCBOT_PPT_FORCE_EXPORT=1` 环境变量(不进 --help/SKILL)。**B 拔 `-s final` 雷**——图标门永远对 svg_output 源检测(误报根除);wrapper docstring 示例去掉 `-s final` 并注明勿用。**C 导出自动质检门**——svg_to_pptx 导出前内嵌复跑 quality checker 逐页硬错误(坏 XML/禁用特性/图片缺失/几何 error),error 拒绝导出、无豁免参数(fail-open 于 import 失败)——"忘跑/不跑质检"从此无效。**D** 验收门报错计数措辞修正。**E 几何质检加"文字骑卡片边缘"检测**(warning 带坐标:文字与可见矩形交叠面积占比 0.20.85 即骑边,P12/P14/P18 三类当场可命中;P21 饼图弧线错误静态无解,只能渲图过目)。**F 修 svg_final 嵌图失效 bug**——finalize 先 copytree 到 `.build/svg_final` 再就地嵌图,`../images/` 从 svg_final 解析必落空 → **所有 deck 的 svg_final 一直嵌不进外链图**(渲图验收 PNG 里图片也是空的);`_resolve_image_path` 加"rebase 回 svg_output 同相对路径"兜底,实测 data:URI 落位。本机全链路回归:未渲→硬拒(带 flag 也拒)/ pending→拒、flag 放 / pass→放行 / 质检 error→拒 / env 强制→放;71 charts 模板几何 0 error。已知边界:P21 类"图形画错但不重叠不越界"仍只有渲图过目能拦——"看没看"无法机器验证,治本要平台层 vision 验收(待做,同 0.35.1 备注)。
### 2026-07-02 / ppt 渲图验收闭环 + 导出验收硬门 + 几何质检(139a59c5 复盘,bump 0.36.0)
复盘 task 139a59c5(deepseek-v4-flash,25 页陶瓷节点方案):用户实报"很多地方错位"。本机 PowerPoint COM 渲全部 25 页定位三类错位:①图标压字/游离(P4/P5/P8/P10/P16/P24——质检报"缺图标"后模型写 `add_icons.py` **regex 批量盲插坐标**,插完没看);②大字号数字压说明文字(P5 万亿/26%);③目录溢出页底(P2)。**根因:SKILL 阶段六"全量渲图验收"被整个跳过**——进度步骤标 completed 但唯一动作是 `echo 交付清单`,`svg_preview` 全程 0 调用;文档要求了但无机制强制(与 0.35.1 教训同构:纯文档约束拦不住弱模型)。改动三层:**A 验收闭环+导出硬门(机制)**——`svg_preview.py` 渲 project 时登记 `.build/acceptance.json`(每页 svg_output 源 sha1 + rendered_from + verdict;svg_output 比 svg_final 新的页拒登记);新增 `accept_pages.py`(`--pass/--pass-all/--fail --reason/--status`,标 pass 前校验"渲过 + PNG 在 + 渲后源没改");`svg_to_pptx` 导出边界加验收门(spec_lock 存在时每页须 verdict=pass 且源 sha1 未变,finalize 前渲的也拒;`--allow-unreviewed` 逃生口)——"从没渲过就交付"和"改页不复看"在导出边界被确定性挡下,单页返工回路(`--pages N` 重渲 merge 记录)已本机全链路验证。**B 几何质检(提前拦截)**——`svg_quality_checker` 新增 check 13:按字符估宽(CJK≈1em/Latin≈0.5-0.7em)+ translate 累加构包围盒;**图标压字、基线出画布=ERROR**(几何精确),**文字-文字重叠一律 WARN 带精确坐标**(估宽分不清擦边与压字,词云/象限图等密排设计会误伤,判断权交渲图验收;SKILL 阶段四明确 Geometry warn 渲图时必须对着坐标看);tspan 按"视觉行"归组续排(`$4.2B <tspan>(35%)</tspan>` 是一行不是两段),71 个 charts 模板 0 error 误报、复刻事故的 fixture 全命中。**C 管线顺序+反模式(文档)**——SKILL.md 管线改"后处理→渲图验收→导出"(验收在导出前),阶段五=finalize+全量渲图+逐页过目+标记,阶段六=拆备注+导出(验收门+图标门双硬门);反模式加"没看 PNG 就 --pass-all"和"为消警告脚本批量盲插元素不复看"。SKILL_LIST 同步。已知边界:gate 只能强制"渲过、源没改",看没看 PNG 无法机器验证(--pass-all 仍可被糊弄,但本次事故"从不渲图"的直接通路已封死)。
### 2026-07-02 / ppt skill 补「禁自搓导出器」硬约束(966041e5 复盘,bump 0.35.1)
复盘同一 task 后续产物 `陶瓷资源节点建设方案 (3).pptx`(deepseek-v4-flash 跑):python-pptx 拆开验证 **25 页每页只有 1 张 1280×720 整页 PNG 贴图、零原生文本/形状**——skill「原生可编辑 DrawingML」的核心卖点全废。根因:模型**整条绕开官方管线**——DB 轨迹里 `svg_quality_checker / finalize_svg / svg_to_pptx / svg_preview / total_md_split` 官方脚本**调用次数全是 0**,取而代之自己 `pip install cairosvg` + 手搓 `export_pptx.py` 调 16 次,把每页 SVG 渲成 PNG 整页贴进幻灯片。连锁三个用户实报缺陷:①「很多方格子」= 跳过 finalize_svg,图标占位空心 rect 没内嵌;②「生成的图没放进去」= cairosvg 加载不了 `href="../images/*"` 外链(实测 file://+xlink 都渲空白),AI 配图全丢、事后靠 base64 补;③文字溢出出血被裁(P04/P05/P09)+ 标题 font-weight 因属性写坏(`serif" font-weight="bold"` 引号错位)丢加粗。**关键教训**:上一条(0.34.7)硬化的是官方工具**内部**的门(退出码/图标门/验收全量),但只在模型**用了**官方工具时才生效;本次证明模型可完全另起平行管线,内部门无从触发。改动(经用户拍板**只走文档层**、平台层自动检测暂缓):SKILL.md 阶段五加「🛑 导出唯一入口=官方 `svg_to_pptx.py`,默认原生可编辑、纯 Python 无需任何外部渲染器,'渲染器没装'永不是自搓借口」;反模式加「绕开官方管线自搓 SVG→PPTX 导出器 → 一叠不可编辑贴图、价值作废」。**注:仅改 skill 文档,不改线上跑法/官方脚本行为。** 已知残留风险:纯文档约束对'完全无视 skill'的弱模型拦截力有限,真正治本需平台层在 pptx 交付/预览路径自动检测整页贴图(本次未做)。
### 2026-07-01 / 加快捷指令(触发词 → 完整指令,渠道无关)(bump 0.35.0)
用户需求:预先定义"简报 → 给我输出一份昨日的 AI 新闻简报",之后任意入口整条打"简报"就展开执行。关键设计判断:**快捷指令不是 memory**——memory 是注上下文给模型概率召回的软上下文,快捷词必须是入口层、模型跑之前的**确定性替换**(命中即换、零歧义、0 额外 token;存再多条平时上下文也是 0)。落地(方案 A:蹭 memory 的 per-user 存储壳、但触发逻辑独立):①新模块 `core/shortcuts.py`——`shortcuts.md`(`| 触发词 | 指令 |` 两列 md 表)解析 + `expand(ws, uid, text)` 整条 `strip()+casefold()` 精确匹配展开(与「新话题」魔法命令同风格,"帮我出个简报"不误伤);②入口接线两处共用同一 `expand`:渠道核心 `_run_channel_conversation`(微信/企业微信自动都覆盖)+ 网页 `post_message`,起 run 前展开;③`core/memory.py memory_block` 加一行契约告诉模型可维护 `shortcuts.md`(用户说"记个快捷词 X→Y"时写),但**内容不注上下文**、触发不问模型。维护沿用 memory 心智(对话里让模型写,无新增管理 UI)。`tests/test_shortcuts.py` 覆盖解析(跳表头/分隔行、首行赢、大小写归一)+ 展开(精确命中、不部分匹配、缺文件、空文本)全过。
### 2026-07-01 / ppt skill 修复 ppt生成2(966041e5):图标门升硬 + CLI 退出码传播 + 验收改全量(bump 0.34.7)
诊断真实产出 `陶瓷资源节点建设方案.pptx`(deepseek-v4-flash 跑)两个缺陷:①23 页零图标(spec_lock 锁了 chunk-filled+inventory 却全 deck 0 个 `<use data-icon>`);②不少错位。根因不是缺 gate 而是 gate 被打穿:(a) `svg_to_pptx.py:22``main()``sys.exit(main())`——**main() 里所有 `return 1`(图标门/无 SVG/坏路径)全被吞成退出 0**,这是最致命的一处;(b) 导出侧图标检查 `_warn_if_icons_unused` 按设计只软 WARN、照常产出;(c) 模型质检时 `svg_quality_checker.py ... | head -30`,管道吞非零退出码 + `head` 截掉打在最后的零图标 `[ERROR]` 结论;(d) 验收阶段 SKILL.md 本就只要求抽查 3 页,23 页里只肉眼看了 2 页,且封面 vision 已报"半成品/错位"仍未返工直接交付。改动:①`svg_to_pptx.py` → `sys.exit(main())`;②`pptx_cli.py` 把导出侧检查从软 WARN 升为**硬门**(锁图标却全 deck 零 `<use data-icon>``[ERROR]` 退非零、不产出 pptx),加显式逃生口 `--allow-iconless`(应对 lock 过期/有意无图标);③SKILL.md 阶段六验收改「默认渲整本、逐页过目、差评即阻断返工」(废掉抽查 3 页),阶段四/五/反模式补「别用 `| head` 截断质检/导出输出」「别只看几页」「看到差评必返工」。合成测试三例(默认拒/`--allow-iconless` 放行/有图标正常)全过。**注:此修仅改 skill 侧,不改动线上跑法**;导出门只兜"锁了图标却零引用",正常有图标 deck 不受影响。
### 2026-07-01 / 修 look_at_image/seedream 拒收容器绝对路径(bump 0.34.6)
现象:docker backend 下主模型被系统提示告知一切都在 `/workspace` 下,自然产出容器绝对路径(如 `/workspace/ppt生成2/ceramic-node/images/cover_bg.png`)喂给 `look_at_image`,却报「图片找不到或越界」,只有改成 working_dir 相对路径才成功。根因:`tools/image_ref.py resolve_in_root`(look_at_image + seedream 共用)只吃「working_dir 相对 / user_root 相对 / 宿主绝对」三形态,唯独不把 `/workspace/<rest>` 翻回宿主 `user_root/<rest>`——而 host-side 的 send_email 早在 `Tool._resolve_user_file` 做了这翻译。改动:`resolve_in_root` 加容器根(`/workspace`)前缀翻译,**按字符串前缀判断而非 `is_absolute()`**(Windows 上 `/workspace/...` 缺盘符不算绝对);越界仍靠原 `relative_to(root)` 兜住(`/workspace/../secret`、`/workspace/../../etc/passwd` 实测仍拒)。这样 look_at_image/seedream 接受的路径形态与 send_email/wechat_push 及系统提示告诉 agent 的口径一致。
### 2026-07-01 / admin 各用户用量加「最近使用」列(bump 0.34.3)
用户需求:admin 页面「各用户用量」表加一列展示每个用户的最近使用时间。改动:`web/admin.py _user_usage_page` 加一个**全量**(不随 range 筛选)的相关子查询 `max(usage_events.created_at)`,新字段 `last_used_at`(ISO 或 null);语义上刻意用全量而非跟着 range 走的 join——否则选 7d/30d 会把更早的真实 last-used 藏掉,列就失去意义。前端 `admin.js renderUserUsage` 加「最近使用」表头 + 单元格,用 `fmtTimeAgo`(相对时间)展示、`fmtTime` 全时间戳作 title 悬浮,无用量用户显示「—」;colspan 7→8。
### 2026-07-01 / ppt 页数必须用户显式拍板(bump 0.34.2)
用户反馈:ppt skill 生成时页数总默认到 ~12 张,页数从没被真正确认过。根因是行为层:ah 八条对齐里 b 项(页数)只给「常 815 页」区间,又被打包进整批 BLOCKING 确认,用户一句笼统「OK」就整批过、模型自取区间中位数(~12)。修(纯文档):`SKILL.md` b 项改为推**一个具体数字**+ 标为「独立拍板项」;ah 表后新增「🔒 页数 gate(不可默认放行)」——用户没给/没显式认可具体张数时必须单独追问「就定 N 页?」拿到明确整数才写逐页大纲,禁止用区间中位数当默认(唯一例外:用户明说「页数你随意」时按推荐数走、仍在预览写出数字供否掉);`strategist.md §b` 同步补 Non-defaultable gate 硬约束。
### 2026-07-01 / web 清空对话同步清空右侧导航条(bump 0.34.1)
用户反馈:web 端「清空对话」后右侧的导航条(msg-outline-rail 目录圆点)没跟着清空,还留着旧轮次锚点。根因:`chat.js` `clearMessages()` 清空后只 `renderMessages([])`,没重置 outline 状态(切 task 路径 line 344 有 `state.outline=[]; renderOutlineRail()`,清空路径漏了)。修:clearMessages 成功分支补一行 `state.outline = []; renderOutlineRail();`,与切 task 同款。
### 2026-07-01 / ppt skill 工作目录重构:中间物收进隐藏 .build/(bump 0.34.0)
用户反馈"中间产物/文件夹过多"。架构判断:`<project_dir>` 根把三类混摊了——持久源(sources/images/svg_output/notes/两个 spec)、交付物(exports)、**可再生构建产物(svg_final/preview/backup)**;第三类是 build artifact,不该和源平级。修:新增 `project_utils.build_dir/svg_final_dir/preview_dir/backup_dir` 单一事实源,把 svg_final→`.build/svg_final`、preview→`.build/preview`、backup→`.build/backup/latest`(**只留最新**,不再堆时间戳)。`.build` 是 dotfile → `/v1/files` 自动隐藏 → 用户可见面从 ~11 降到"源+交付物"。改动:finalize_svg / svg_preview(_collect)/ pptx_discovery(`final`→`.build/svg_final`)/ pptx_cli(backup 路径 + rmtree 清旧)+ SKILL 工作目录约定/命令。端到端实测:根目录只剩 exports/+svg_output/,`.build/` 三子目录就位,导出/预览/backup 全正常。
> 关于"svg现在能 web 预览、要不要收敛成一个 svg 目录":架构上 svg_output(可编辑源:占位符+相对引用)与 svg_final(自包含编译产物:图标展开+图片 base64)是**两态**、不能合并成一个文件(可编辑 vs 浏览器忠实渲染冲突);但只该暴露一个——svg_output 可见、svg_final 进 .build。终态(下一议题):干掉持久化 svg_final,finalize 纯内存化 + web 忠实预览走"按需 finalize 再 serve",磁盘就一个 svg 目录。本次先做隐藏,未做内存化(牵涉 web 层)。
### 2026-07-01 / ppt skill 验证 ppt生成2 后修复:svg_preview cairosvg 兜底 + gate 计入 circle + 反卡片映射(bump 0.33.x→并入 0.34.0)
DB 取证验证「ppt生成2」(用户重跑,商务红+图标):图标 31 个(前 0)、商务红 #C00000、封面 imagegen 配图、扁平 gate 在跑 —— **代码类修复随 bind-mount 全部生效**。但视觉验收卡住:轨迹显示沙箱 `which chromium/cairosvg/rsvg` 全空、`svg_preview.py` 没被调用、模型自己 `pip install cairosvg` 渲 raw svg_output → **6/13 图标页 INVALID_MATRIX 失败**(cairosvg 不认 href-less `<use data-icon>`)。根因:**服务器沙箱镜像旧、没带 chromium 层**(镜像非 bind-mount,`deploy/update.sh` 第 4 步 rebuild 才更新;需服务器执行)。据此两处代码修复(用户选定):
- **svg_preview.py 加 cairosvg 兜底**:`find_browser()` 改返回 None 不抛错;无 chromium 时回退 cairosvg,且渲前**用 finalize 的 embed_icons 把 `<use data-icon>` 预展开成真 `<path>`**(避开 INVALID_MATRIX);顺带修上一版遗留的 `--screenshot` 绝对路径 + 保留 chromium 优先(保真更高)。browser happy-path 实测完好。
- **扁平 gate 计入 circle/polyline**:`svg_quality_checker` 图形图元加 `<circle>`(node/venn/bubble/timeline 是真图,之前把 21-circle roadmap 误判"无图形");并收紧——文字密集 deck **≥60% 页无图形 → ERROR**(不止"全 deck 0 图形"),4060% → INFO。实测:ceramic 式(46%)→INFO exit0、多数扁平(75%)→ERROR、极端→ERROR、全 circle→clean。
> 部署:视觉验收/PDF/mermaid 的根仍是镜像 —— 服务器跑 `sudo deploy/update.sh`(不加 --skip-build)rebuild `zcbot-sandbox`(Dockerfile 已含 chromium),存量 per-user 容器待 ensure() 用新镜像重建(必要时手动 docker rm 该用户旧容器)。
同批加 **执行层反卡片映射**(治"大段大段卡片阵"):验证 ppt生成2 发现 SVG 注释自写 "3x2 Card Grid"/"3x3 Grid"——执行模型对"N 个并列项"默认摊成卡片网格。executor-base §page_rhythm:`dense` 行去掉"card grid 是 baseline"的背书;加一段硬映射「先看内容**关系**再选图形」(系统→hub_spoke/分层、流程→flow、层级→树/金字塔、循环→环、互依→mind_map、对比→象限、≥3数据→图表),**卡片阵封顶 ~1/3 页**、连画两页网格下一关系页必须上示意图,并指回 page_charts(strategist 分配了模板就画那个别塌回卡片)。诚实边界:这是执行模型设计本能天花板,prompt 抬下限但不保证每张示意图都漂亮。
### 2026-06-30 / ppt skill 加商务红品牌预设 + 配图默认主动提议(bump 0.33.5)
用户两个需求:(1) 加一款红色主题;(2) 用户没给图时在需要处主动配图。
- **商务红品牌预设**:新增 `templates/brands/business-red/design_spec.md`(同 anthropic 格式:#C00000 全色表 + primary-deep/gold/info/positive/alert/surface/border/muted 派生色 + 宋体标题/黑体正文字体栈 + 实心图标偏好 + 政企口吻;无 logo,注明用文字 wordmark / 可后补)+ `brands_index.json` 加条目。**红色承载在 brand 而非 visual-style**(visual-style 不带色)。同时把**商务红设为 strategist §e 默认配色候选**:中文政企/集团/科研商务汇报默认列入 ≥3 候选(红金 #BF9B5F / 红蓝 #2B4C7E 二选一点缀,纯红只压标题/关键数据)。SKILL §默认主题 + 八条对齐 h 行同步指向。
- **配图默认主动提议**:strategist §h + SKILL h 行改——用户没给图时**不再默认整本 A(no images)**;封面/分节/概念/breathing/氛围页主动把 ai 配图作为候选提给用户(数据/列表/流程页仍走图表→§VII,不配装饰图)。仍全程 gated:用户在 h 确认 + imagegen 自带成本门(提议免费,确认才花钱)。
> 附:`scripts/config.py` 的 INDUSTRY_COLORS 未移植(又一处 ppt-master 残留引用),strategist 文档表是实际依据,已直接在表里加商务红行。
### 2026-06-30 / ppt skill 修「生成的 PPT 缺图形」:扁平 deck 质检 gate + 策略层视觉下限(bump 0.33.4)
延续缺图标排查,统计最近 ppt生成 任务 24 页 SVG 的元素构成:**`<path>`=0、`<image>`=0**,整本是 `<text>``<rect>`(文字方块),零示意图/图表/配图。根因同图标——71 个 `charts/` 模板没用、content→版式映射形同虚设,且策略层把"Not every page needs a chart"当跳过口子(spec_lock 实际 `page_layouts: free design`、无 page_charts 段),输出层无 gate 拦扁平 deck。两层修(用户选定):
- **A' 输出 gate(svg_quality_checker)**:统计每页图形图元 `<path>/<polyline>/<polygon>/<image>`(`rect`/`line` 是版面脚手架不算);**≥6 页且文字密集(avg `<text>`≥10/页)却全 deck 0 图元 → deck 级 error 退非零**;多数页无图元 → INFO;<6 页豁免(不误伤极简/teaser)。实测:8 页文字方块exit 1;任一页带 path放行;4 豁免
- **B' 策略层视觉下限(strategist.md GATE)**:把 §633「Template Match」从纯建议升为硬下限——内容 deck(≥6 页)每个能结构化的内容页必须分配视觉处理(page_charts 模板 / page_layouts 结构模板 / §VII 自绘示意图),**spec_lock 不许 page_charts + page_layouts 同时空着**;给出 content→图形映射速查;明示下游 A' 会硬卡。同步改 SKILL §大纲映射纪律 + §阶段四质检清单 + spec_lock_reference page_charts 段。
> 诚实边界:prompt+gate 抬下限(逼别交全文字 deck),执行模型设计功力是上限;gate 守"零图形"底线而非"每页必图表",避免误伤极简风。
### 2026-06-30 / ppt skill 修「生成的 PPT 缺图标」四层断点(bump 0.33.3)
查真实用户(caoqianming@foxmail.com)两个「ppt生成」任务的 DB 执行轨迹:24 页 SVG 共 0 个 `<use data-icon>`。根因是图标管线四个环节没有一个强制图标落地——**策略层(有时)锁图标,执行层不放、质检层不拦、工具层还断着**。四层一起修:
- **B 工具断点**:references/SKILL 里 23 处路径仍指向已不存在的 `skills/ppt-master/`(zcbot 是 `skills/ppt/`)→ 模型按文档 `ls .../icons/<lib>/|grep` 验名得空集 → 放弃图标;且 strategist 强制用的 `icon_sync.py` 在 zcbot 根本没有(GATE 空转,正是某任务连图标都没锁的原因)。修:全量改路径 + 新建 `skills/ppt/scripts/icon_sync.py`(复用 embed_icons 解析,验名+拷进 project/icons,缺名非零退出)。
- **A 质检兜底(硬门)**:`svg_quality_checker.py` 加图标校验——spec_lock 锁了 `icons.library`+非空 `inventory` 但全 deck 0 图标 → **deck 级 error 退非零**(逼回执行重写);单页 0 图标 → warning(封面/分节/breathing/尾页豁免)。
- **C 执行强制**:executor-base §4 + SKILL 执行纪律第 4 条从"怎么写图标"改为"**内容页必须放 13 个 inventory 图标**"(自由设计无模板可继承图标,只能逐页手写)。
- **D 导出兜底(纵深)**:`svg_to_pptx` 导出前预扫,锁了 inventory 却 0 图标 → stderr 大声 [WARN](非致命,防跳过质检直接导出)。
> 附:核实 native 转换器(`drawingml_converter` 调 `use_expander`)本就自己从图标库展开 `<use data-icon>`,故 svg_output 保留原始占位符是正确的——原设想的"finalize 硬前置防丢图标"前提不成立,D 改成 A 同源的导出层警告。
同版附带修 **svg_preview.py 在沙箱里渲不出 SVG**(报"未找到 Chrome / Edge"):移植自 ppt-master 的 `find_browser()` 只认 Windows `chrome/msedge`,不认沙箱镜像自带的 `/usr/bin/chromium`(给 mermaid 装的)→ 视觉验收这关在容器里全程失效。对齐 `rendering/pdf.py` 的发现逻辑(认 `chromium`/`chromium-browser`/`google-chrome` + `$CHROMIUM` 覆盖);`render()` 补容器必需的 `--disable-dev-shm-usage` + 临时 `--user-data-dir`(cap-dropped 容器 /dev/shm 仅 64MB,否则 chromium 渲染中途崩);顺带挖出并修一个静默已久的 bug——`--screenshot` 传相对路径 chromium 写不出文件(原代码吞 stderr 看着和"没浏览器"一样),改传**绝对路径**并把 chromium stderr 暴露出来。skills 是 `/sandbox/skills:ro` bind 挂载,改动下次 exec 即生效,无需重建镜像。
### 2026-06-30 / look_at_image 偶发超时:tool 内透明重试 + 超时上限提到 120s(bump 0.33.2)
Seed 2.0 Lite 非流式,长 OCR 首字节可能逼近 60s read timeout → 偶发超时,且返 `[Error]` 会触发主模型重发整个 tool call(图 base64 重传、输入 token 再付一次,正中"报错重试烧 token"根因)。修法:`ark_client` 新增 `ArkTimeoutError(ArkError)` 子类(仅超时/网络抖动抛它,HTTP 4xx/5xx 业务错误仍抛普通 `ArkError` 不重试);`look_at_image` 对该子类退避重试(`timeout_retries` 默认 1 次,退避 2^n s),在 tool 内消化掉不抛给主模型;`doubao.yaml` vision `request_timeout_s` 60→120。子类仍是 `ArkError`,seedream 等现有 `except ArkError` 不受影响。
### 2026-06-30 / 修复 web 端 SVG 无法预览(bump 0.33.1)
SVG 在 `<img>` 里必须 Content-Type=`image/svg+xml` 才渲染。前端 `preview.js``_showImage` / mini 图片分支据扩展名强制 blob mime(与服务端响应头无关);后端 `download` 接口对 `.svg` 显式回 `image/svg+xml`(部分部署环境 mimetypes 未注册 svg → 会被 FileResponse 猜成 octet-stream)。双保险。
### 2026-06-29 / ppt skill 清空重构为 SVG-first(移植 ppt-master,bump 0.33.0)
- 背景:旧 ppt skill 用 python-pptx + 固定组合版式件(`add_card_grid` 等),版面被 helper 框死 → 单调、AI 味重,是架构天花板,调参救不了。用户要求"清空重做,参考 github ppt-master"。
- 路线(范围 B:搬引擎+知识、弃 GUI、适配 zcbot):核心改为 **SVG-first** —— AI 逐页手写 SVG 设计稿,再由纯 Python 转换器(`svg_to_pptx/`,只依赖 python-pptx)逐元素译成原生可编辑 DrawingML。依赖闭包干净:转换器/质检/finalize 三套自包含,不碰 ppt-master 的 config/project_manager 重型层。
- 搬入:引擎(`svg_to_pptx.py`+包 / `finalize_svg.py`+`svg_finalize/` / `svg_quality_checker.py` / `total_md_split.py` / `update_spec.py` / 辅助 `project_utils`+`error_helper`);设计知识 references(`shared-standards`/`executor-base`/`strategist`/`image-layout-*`/`canvas-formats` + `modes/`5 + `visual-styles/`19);templates 全量(layouts/decks/brands/charts + **icons 30MB/1.1w+ 图标,用户要求一并入仓**)。
- 弃用/替换:浏览器 Confirm UI → 聊天 BLOCKING 八条确认;live preview server → 新写 `svg_preview.py`(无头 Chrome 渲 SVG→PNG,优先渲 svg_final 显图标);TTS/复杂动画(动画留 opt-in);ppt-master 配图子系统 → 走 zcbot 现有 imagegen skill。默认主题改"自由设计"(商务红降为候选)。
- 踩坑修复:vendored 脚本 print 含 ©/NBSP/emoji,在 zcbot Windows GBK stdout 上 `UnicodeEncodeError` 崩([[feedback_windows_console_emoji]])→ 给 6 个入口脚本顶部加 `sys.stdout.reconfigure(utf-8)` shim。
- 端到端验证通过:造材料领域 4 页 deck(低碳水泥),质检 0 error → 拆备注 → finalize 嵌图标 → 导出 4 页原生 pptx(13.33×7.5in、每页带备注)→ svg_preview 渲 PNG 肉眼确认设计级观感(swiss-minimal,非 AI 味)。
- 文件:`skills/ppt/`(SKILL.md 重写 + scripts/ + references/ + templates/);依赖加 Pillow(svglib/reportlab 注释为可选老 Office 兜底)。
### 2026-06-29 / system prompt 加通用 context 纪律铁律(bump 0.32.5)
- 承上:反复 dump 全文 abstract 烧 2.5M token 不是 brief 专属,任何 skill 让弱模型处理一批长文本都可能踩。故在 system prompt 单一事实源 `prompts/system/general_v1.md` 的「工作原则」段、紧挨「少来回」加一条全局铁律:大段 `run_python`/`shell` 输出会进对话历史每轮重发,中间数据落文件、只 read 用得上的片段、别整批重复打印。
- 与既有规则互补:行 7(源码落 .py 文件)管代码、行 42(少来回)管轮数、这条管「大块数据输出」。brief skill 里的场景化版本(0.32.3)保留做细化。
### 2026-06-29 / 定时任务默认单次超时 0→1800s(bump 0.32.4)
- 承上:超时此前默认 0(不限),配合"超时被吞成 ok"的旧 bug,一个跑飞的 job 能无限拖。改默认有限值 1800s(30min):新建 job 不指定 `timeout_seconds` 时给 1800,`0` 仍保留为"不限"逃生口。
- 单一事实源 `core/scheduler.DEFAULT_TIMEOUT_SECONDS=1800`,`create_job` 与 `tools/schedule.py`(agent 建 job 的工具)默认都引它;tool JSON schema 描述同步注明"default 1800 / 0=no limit / 重活可调大"。`create_job` 里 `int(timeout_seconds or 0)` 保留显式 0=不限语义。
- 存量:把线上 job `e621c8a6`「每日水泥科研简报」的 `timeout_seconds` 由 600 手动改为 1800(直接 SQL UPDATE,未动其它 job)。
### 2026-06-29 / brief skill 加 context 纪律,堵反复 dump abstract 烧 token(bump 0.32.3)
- 承上条同一 job 复盘:agent 把同一批 38 篇全文英文 abstract 用 `run_python`/`print` **反复灌进上下文**(实测 dump ≥3 次),工具输出每轮重发 → 48 次 LLM 调用累计输入 **2.5M tokens**(输出仅 28K),既慢又贵,还顶满 600s 超时。根因:brief skill 虽已要求把证据落 `evidence.md` 文件,但没明令"别反复 print 进上下文",弱模型(deepseek-v4-flash)规律不足就放飞。
- 修:`skills/brief/SKILL.md` 三处加指示文——阶段二「context 纪律」(落文件、按需 read、别整批重打)、阶段三「一次成稿别重复 dump + 按期刊分批写」、反模式加一条。纯指示文,frontmatter/description 不变 → SKILL_LIST 无需更新。
- 仍存的更大杠杆(未做):框架层对超大 `run_python` stdout 在上下文里做截断/省略,根治"工具输出滚雪球",但改动面大、有风险,留待单议。
### 2026-06-29 / 修定时任务超时被误记成 ok(bump 0.32.2)
- 实测 bug:定时 job(isolated)跑满 `timeout_seconds` 被调度器协作式 cancel 后,`_run_agent_bg` 对 ok/cancelled 都把 `run_status` 收回 `idle`(二者 DB 不可区分),而 `_execute_scheduled_job` 收尾只判 `run_status=="error"`,于是超时中断被落成 `last_status="ok"` —— 掩盖"跑到一半没写 sections / 没推送",且不计连续失败、不触发兜底。复盘 job `e621c8a6`「每日水泥科研简报」:`timeout_seconds=600`,task 创建→`last_run_at` 正好 600.0s,最后一条 agent 消息停在"按期刊分组打印 38 篇摘要"(还在取数阶段),`last_status` 却是 ok。
- 修:`web/app.py` `_execute_scheduled_job` 在超时分支置 `timed_out` 标志,run 收尾后若 `timed_out``record_result(status="error", ...)` 并直接返回(不投递半成品 notify)。复用既有 error 语义:计入 `consecutive_failures`、到阈值自动停用、前端 crons.js 显示「上次失败」。不动 `_run_agent_bg` 的 idle-on-cancel 共享语义(HTTP cancel/drain 也用)。
- 配套:该 job 真正的诱因是 600s 超时对"7 刊 38 篇带中文摘要重写 + 渲 docx"太短,需用户把 `timeout_seconds` 调大(或 0=不限)。诊断脚本 `scripts/diag_sched_e621.py`
### 2026-06-29 / channel 长会话上下文软重置(Phase 1,bump 0.32.0)
- 问题:微信/企业微信复用同一常驻 chat_task,`Session.load` 全量喂模型 → 越用越贵/慢,终撞 context window。业界(OpenClaw/Hermes)做法:阈值摘要 + 会话分段 + 持久记忆;IM 场景独有的「会话分段」最高杠杆且零信息损失。
- 方案(对外契约友好,无删用户数据):`tasks` 加 `context_base_idx`(0019,additive),`Session.load` 只把 `idx >= base` 的消息装进 LLM 上下文,base 之前的历史仍全量留 messages 表(web `/messages` 不 gate,照旧翻完整历史)。**关键雷点**:`_db_idx` 取 DB 真实总数而非 `len(rows)`,否则 append 续号撞 `uq_messages_task_idx`
- 两个触发口(`core/wechat/service.py`):① 自动 gap——入站时距上次消息超 `channel.session_gap_hours`(默 6h)→ 软重置,base=最后一条 user 消息 idx(保留上一轮原文做续聊锚点,不是失忆墙);② 手动「新话题/新会话/`/new`/清空上下文」→ 硬重置 base=总数,彻底从零。`_run_channel_conversation`(`web/app.py`)接入两口;`clear_messages` 全删后顺手 base 归 0。
- Phase 2(阈值结构化摘要,对齐 Hermes 四阶段③)、Phase 3(sqlite-vec/FTS5 持久检索,解「问很久前的精确内容」)延后,待观察 token 曲线再定。
### 2026-06-26 / 消息框支持拖拽文件 + 修多次粘贴互相顶掉(bump 0.31.3)
- 现象:① 消息框只能粘贴文件不能拖拽;② 连粘多个文件,后一个把前一个的 chip 顶掉,只剩一个。
- 根因:粘贴附件 chip 和状态文字共用 `#chat-hint`,每次粘贴用 `innerHTML =` 整体重建只塞最新一批,且上传进度回调写 `hint.textContent` 也会清掉已有 chip——附件与状态文字抢同一个容器。
- 修复(`web/static/dev.html` + `web/static/js/chat.js`):① 新增独立 chip 托盘 `#chat-attach`(textarea 与按钮行之间),chip 累积靠 append + 按 `rel` 去重,状态进度只写 `#chat-hint`,从根上解耦;② 给整个 `#chat-form``dragenter/over/leave/drop`(enter/leave 计数防闪烁,`_dragHasFiles` 只认文件拖拽,微信镜像只读时不接收),复用 `uploadFiles` + 同一托盘;`takePastedRels` / 删除 / 预览三处改查托盘。
### 2026-06-26 / 消息目录圆点错位再修(点击竞态 + 触底兜底)(bump 0.31.2)
- 现象(0.20.4 后仍残留):① 点圆点,被点的圆点不变红、活跃态跑到途经轮次(尤其点 #1 跳到 #2);② 点最后一个 / 滚到底,倒数第二个变红。
- 根因:① `jumpToMessage``scrollIntoView({behavior:"smooth"})` 在动画途中连发 scroll 事件,`updateActiveOutlineDot` 按动画途中位置反复改写,抢走刚 `setActiveOutlineIdx` 的显式点选;② 「顶线以上最后一卡」判活跃,最后几轮永远顶不到顶线(容器先到底)→ 永远停在倒数第二个,这是 scroll-spy 经典「不可达末项」bug,普通滚动也复现。
- 修复(`web/static/js/chat.js`):① 加 `_outlineJumpLock`,点选后锁定活跃态,平滑滚动期间 `updateActiveOutlineDot` 直接返回,700ms 兜底解锁并按落点重算一次;② `updateActiveOutlineDot` 加触底分支——滚到容器底且无更新内容可加载(`!msgHasMoreNewer`)时,直接判最后一个已加载轮为当前。
### 2026-06-26 / admin 近7天用量表加合计行(bump 0.31.1)
- 纯前端展示:`renderByDay`(`web/static/js/admin.js`)在 `by_day_7d` 表底加 `<tfoot>` 合计行,对 7 天 cost_cny/tokens_in/tokens_out 求和;`tfoot .total-row` 样式(粗体 + 上分隔线)在 `admin.html`。无数据时不渲染合计行。后端数据已有(`_usage_section`),无改动。
### 2026-06-26 / per-account 模型访问控制(档位制,复用 plan 列)(bump 0.31.0)
- 需求:管理后台按账户控制可调用哪些模型。deepseek flash/pro + seedream/seedance + 内网 local 对所有人开放,doubao/glm 按账户分配。
- 架构决策(与用户对齐):**档位制**而非逐账户逐模型授予 —— 复用 `users.plan`(0001 起休眠列,无需 migration),「档位→模型集合」配在 `config/agent.yaml` `model_tiers`,用户只挂一个 plan。管理成本 O(档位) 而非 O(用户×模型)。`plan` 空/未知 → `default` 档;`role=admin` 始终全开。`"*"` 通配支持全开档(当前未用)。
- 起始两档:`default`(deepseek flash/pro + local r1/qwen3 + seedream + seedance)、`pro`(+ doubao turbo/pro/evolving + glm pro/pro52)。
- 后端 `core/model_access.py`:`allowed_set(plan,role)`(None=全开)/ `is_allowed`。三个 list 端点(`/v1/models` `/v1/image_models` `/v1/video_models`)按档过滤 → 用户只看到本档模型(chat 前端无改动,下拉自动收窄)。三个 resolve(文本/图/视频)加 `user_id` 门控:**显式选模型**(建 task / 切模型 / 发媒体)档外 → 403;**老 task 下次发消息**若存量模型已不在档位内 → 持久落回 `deepseek_v4.flash`(send 路径锁行内 UPDATE;optimize_prompt 同降级但不持久);定时任务执行(user_id=None)grandfather 不门控。
- 管理端 `web/admin.py`:`GET /v1/admin/tiers`(档位定义 + 全模型目录,给 UI 图例)、`PATCH /v1/admin/users/{uid}/plan`(校验档位名存在,写 `users.plan`);`/v1/admin/usage/users` 行补 `plan` 字段。
- 管理 UI `admin.js`:各用户用量表加「档位」列(内联下拉选档 → PATCH → 刷新)+ 档位图例(每档含哪些模型,id→显示名);加 `apiSend`(PATCH/POST)助手。
- 已知边界:媒体 **tool 注册**不按档(seedream/seedance tool 仍随 ARK key 注册,只门控 variant 选择),当前各档都含媒体基线故无实际影响;待有付费媒体 variant 再收口 tool 层。
- 文件:`core/model_access.py`(新)、`config/agent.yaml`(model_tiers)、`web/app.py`(门控+过滤+降级)、`web/admin.py`(tiers/set-plan 端点)、`web/static/js/admin.js`(档位列+图例)、`DESIGN.md`(plan 列语义)。
### 2026-06-26 / 新增豆包 Seed 2.1 + GLM 5.2 文本模型档案(bump 0.30.0)
- 背景:用户要接入火山方舟豆包 Seed 2.1(turbo/pro)、自进化版 doubao-seed-evolving,以及智谱 GLM 5.2。`/v1/models` 自动扫 `config/models/*.yaml`,加档案即在 UI 下拉出现,无需改代码。
- 新增 `config/models/doubao.yaml`(family=doubao):`turbo`/`pro`/`evolving` 三 variant。走 Ark OpenAI 兼容端点(`openai/` 前缀 + `api_base=ark.cn-beijing.volces.com/api/v3`,复用媒体侧 `ARK_API_KEY`),同 local.yaml 范式。单价按火山 2026-06 发布价:turbo 3/15(缓存 0.6)、pro 6/30(缓存 1.2);evolving 官方未公布单价,暂按 pro 估值兜底(宁高勿低)。context 均 256K。
- `config/models/glm.yaml` 新增 `pro52`(GLM 5.2,model_id `zai/glm-5.2`,1M 上下文,单价 8/28 缓存 2),**与 `glm.pro`(5.1)并存**,线上引 `glm.pro` 的 task 不受影响(公测期兼容)。
- thinking_mode 均设 false:Seed 2.1 / GLM 的深度思考开关走 body 协议(非 OpenAI `reasoning_effort` 等级),透传等级需 core/llm.py 加 family 分支,留 TODO;设 false 不发 reasoning_effort,模型默认仍深度思考,不影响调用。
- 文件:`config/models/doubao.yaml`(新增)、`config/models/glm.yaml`(加 pro52 variant)。
### 2026-06-26 / 定时任务执行历史列表(分页)(bump 0.29.0)
- 背景:isolated 模式每次触发新建一个 task,旧的带 `scheduled_job_id` 被普通列表过滤掉、UI 够不到,只有详情里单个「打开它跑的任务」按钮指向 `last_task_id`(最近一次)。历史 task 一直在库里(不删除),但访问不到。
- 改:把单按钮换成右栏 **Tab 布局(详情 / 执行记录)**,动作按钮(停用/删除)提到右栏顶部 head;执行记录 tab 是**带分页的列表**。决策(与用户对齐):**保留全部历史不剪枝**(以后再清),列表做好分页;布局选 Tab 而非三栏(固定宽 modal 三栏每栏太窄、长文本难读)。
- 后端:新增 `GET /v1/schedules/{job_id}/tasks?page=&page_size=` —— 查 `scheduled_job_id == job AND user_id == 自己 AND deleted_at IS NULL`,`created_at desc` 分页,复用 `_task_dict`(带消息数/用量),返回标准分页壳 `{page, page_size, count, results}`。user_id 过滤天然隔离他人 job;非法/非本人 job_id 返回空。
- 前端 `crons.js`:`selectJob` 渲染 head(名+状态+按钮)+ tab 条 + `#cr-tab-body`;`renderTab` 切详情/历史;`loadHistory(jobId, page)` 拉一页渲染进 `#cr-hist`(时间·名称·状态/消息数,点某条 → 关弹框 + `selectTask` 打开那次对话),底部「上一页/下一页」+ 页码;await 后**重查** `#cr-hist` 校验 `data-job`,防切 job/切 tab 的迟到响应串显。persistent 模式天然只显一条。
- 文件:`web/app.py`(新端点)、`web/static/js/crons.js`(tab+历史+分页)、`web/static/dev.html`(`.cr-tabs/.cr-tab/.cr-hist-*` 样式)。
### 2026-06-26 / 渠道卡片收拢绑定管理 + 删 rail 按钮(bump 0.28.1)
- 把渠道绑定/对话/管理全部收进「新建任务」下方的卡片,删掉左下角 rail「微信」按钮(精简页面)。
- 后端 `/v1/channel_tasks` 改为返回 `{ wechat: { bound, task }, wecom: { bound, task } }`:
* bound: 绑定状态(`wechat` 用 `get_binding` 判定,`wecom` 用 `get_wecom_userid`)
* task: 对话摘要(无对话为 null,复用 `_task_dict`)。
- 前端 `loadChannelCards` 渲染三种卡片:
* 未绑定: 虚线占位「绑定微信」(点打开弹框绑定)
* 已绑定无对话: 虚线占位「微信对话(发消息后可打开)」(点打开弹框管理)
* 已绑定有对话: 正常卡片(名称 + N条·时间 + ⚙,点打开对话,⚙ 打开弹框管理)
- 文件:`web/app.py`(/v1/channel_tasks 返回 bound+task)、`web/static/dev.html`(删 rail 按钮+占位样式)、`web/static/js/chat.js`(三态卡片渲染)、`web/static/js/wechat.js`(删 hd-wechat 绑定)。
### 2026-06-26 / 定时任务对话归属 + push 统一记录到渠道对话(bump 0.28.0)
- 问题1:定时任务产生的 task(isolated 每次新建)混进普通对话列表。解:`tasks` 加 `scheduled_job_id`(nullable FK→scheduled_jobs,0017 migration + backfill persistent/isolated);列表 `WHERE scheduled_job_id IS NULL`(+ `working_dir LIKE '%/scheduled-%'` 兜底漏网孤行);`ensure_local_task_row` 加参数,`_execute_scheduled_job` 建任务时填。mode 语义澄清:只管对话是否延续,文件夹两种模式都按 job 复用。
- 问题2:任何 push(定时 `deliver_notify` / agent `wechat_push` 工具)推到微信渠道,web 端渠道对话看不到、没法基于推送追问。解:**记录下沉到 `send_to_user`**(两调用方统一入口)——投递成功后对每个成功渠道 `ensure_channel_chat_task`(不存在自动建,与入站对话共用)+ 写一条 assistant 消息(摘要 + 文件下载链接 + `../rel` read 路径),Unified 进 agent 上下文;`source_task_id` 去重(chat task 内调 wechat_push 时不重复插摘要)。不塞正文(避免膨胀),agent 按需 `read` 产物文件(fs `_resolve` 无越界拦,`../rel` 相对 cwd 上一级;mount=user_root docker 也可读)。前端零改动(markdown 链接 + 文本 read 路径)。push 记录标 `messages.kind="push"`(0018,独立列不进 payload),`extract_last_assistant_text` 加 `WHERE kind IS NULL` 跳过,避免 wecom 入站取回复误取 push 摘要当回复。
- 文件:`core/storage/models.py`(Task.`scheduled_job_id`+Message.`kind`)、`db/migrations/versions/20260626_1000_0017_*.py`+`20260626_1100_0018_*.py`、`core/storage/utils.py`(`ensure_local_task_row`+`append_channel_message`)、`core/wechat/service.py`(`send_to_user` 记录+`ensure_channel_chat_task`)、`core/wechat/inbound.py`(`extract_last_assistant_text` 过滤 kind)、`tools/wechat_bot.py`、`core/agent_builder.py`、`web/app.py`(`_run_channel_conversation` 复用)、`DESIGN.md`(§8.5/§8.7)。
### 2026-06-25 / 渠道卡片改并排(bump 0.27.4)
- 接 0.27.3:两张渠道卡片从竖排改并排(`#channel-cards` flex row,各 `flex:1`),省左栏纵向空间;窄栏内图标左、名称 + 条数·时间堆两行(新增 `.cc-body` 列容器)。
- 确认渠道绑定弹框(左下角「微信」rail 按钮)**保留不动** —— 它是绑定/解绑/测试推送的唯一入口,与卡片(只读对话入口)职责互补不重复(方案②)。
- 文件:`web/static/dev.html`(CSS row + cc-body)、`web/static/js/chat.js`(卡片 markup 加 cc-body)。
### 2026-06-25 / 渠道镜像对话改成左栏固定卡片 + 企业微信也只读(bump 0.27.3)
- 把微信 / 企业微信常驻对话从「任务列表里置顶 + 绿徽章 + 绿边的行」改成「『新建任务』下方两张固定卡片」(`#channel-cards`):它们是每用户每渠道唯一的常驻只读镜像,从可滚动任务列表抽出更清爽、常驻可见。
- 后端:`/v1/tasks` 列表用 `func.coalesce(Task.channel,'web').notin_(CHANNEL_MIRROR_KINDS)` 排除渠道任务,并删掉原 `case(...)` 强制置顶;新增 `GET /v1/channel_tasks` 返回 `{wechat, wecom}` 两条摘要(复用 `_task_dict`,无则 null)。`CHANNEL_MIRROR_KINDS=("wechat","wecom")` 单一真相源。
- 前端:`dev.html` 加 `#channel-cards` 块 + `.channel-card` 绿调样式(`:empty` 自动隐藏);`chat.js` 加 `loadChannelCards()`(enterApp/刷新按钮调)+ `syncChannelCardActive`(selectTask 同步高亮);移除列表行已失效的绿徽章逻辑。
- 企业微信对话补只读锁:`applyChannelComposerLock` / `sendMessage` 守卫从硬编码 `channel==='wechat'` 改读 `CHANNEL_BADGE`(`channelCfg`),微信 + 企业微信都 readonly,提示文案按渠道动态。
- 文件:`web/app.py`(列表排除 + 新端点 + 常量,移除 `case` import)、`web/static/dev.html`(卡片容器 + CSS)、`web/static/js/chat.js`(卡片渲染 + 只读锁统一)、`web/static/js/main.js`(enterApp 调 loadChannelCards)。
### 2026-06-25 / 企业微信入站对话支持图片/文件附件(bump 0.27.2)
- 接续 0.27.0 企业微信入站(此前只收文本)。补图片/文件:`wecom.download_media(media_id)` 走 `media/get`(成功回二进制流 + Content-Disposition 文件名,出错回 JSON errcode、40014/42001 重取 token);回调按 `MsgType` 分支,image/file 下载后构造 `InboundAttachment(kind/file_name/data)`(与个人微信同结构,仅这三字段被用到)→ 喂同一 `_run_channel_conversation`,复用其落盘 + 拼 `[用户上传的...]` 行(图片 agent 自调 look_at_image,文件走 Read)。
- 语音/视频/位置/链接/事件暂回 success 不处理;附件下载失败则静默跳过(打日志)。纯图片/文件消息无文本 → 核心据附件行生成 text,不再被「空消息」挡掉。
- 文件:`core/wechat/wecom.py`(`download_media` + `_filename_from_disposition`)、`web/app.py`(回调 image/file 分支)、`web/static/dev.html`(「企业微信(仅推送)」→「推送 + 对话」文案纠正)。`_filename_from_disposition` + import 自测过。
### 2026-06-25 / wechat_push 按渠道定向投递(修「点名企微仍推到个微」,bump 0.27.1)
- bug:用户说"推送给我的企业微信",消息却同时进了个人微信。根因 —— `send_to_user` 是无差别广播(`for ch in active_channels()` 逐个推),且 `wechat_push` 工具压根没有"指定渠道"的参数,agent 想只发企微也做不到;部署同时开了 clawbot+wecom 两渠道 → 一条推送两边都到。早期只有 clawbot 一渠道时此语义无碍,加企微后暴露。
- 修:`send_to_user` 加 `channel=None` 入参 —— `None` 保持广播(定时任务/不点名沿用,向后兼容),指定 `wecom`/`clawbot` 时只投那一条(该渠道未开则返回单条 `no_binding`,**不静默回退到别的渠道**避免又推错);`WechatPushTool` 加可选 `channel`(enum wecom/clawbot)+ 描述教 agent「用户点名某微信就传对应 channel」。
- 文件:`core/wechat/service.py`、`tools/wechat_bot.py`。
- 需求:企业微信此前只做出站推送(渠道 B 定位"和邮箱似的");现补**入站对话**,企微也能像个人微信那样直接聊。
- 关键认知 —— 入站方式与 ClawBot 不同:ClawBot 走**长轮询**(`getupdates` + `run_inbound_manager` 常驻),企业微信走**回调 webhook**(企微服务器主动 POST 加密 XML),故**不需要后台轮询 task**,只加一个 HTTP 端点。回复因 agent 跑 >5s 超被动同步窗口 → 走 `message/send` 主动推回(复用 `push_wecom`),被动回复直接回 `success` 防重试。
- 抽象:把 `_run_wechat_message` 的"建/复用会话 task → 落盘附件 → 抢 run 锁 → `_run_agent_bg` → 取回复"抽成**模块级 `_run_channel_conversation(app, uid, text, atts, channel)`**,个人微信(`channel='wechat'`)与企业微信(`channel='wecom'`)同核心、**各一张会话 task**(企微 binding 也存 `chat_task_id`),互不串扰。run 锁挡企微回调的并发/重复投递。
- 新增:`core/wechat/wecom_crypto.py`(WXBizMsgCrypt 等价:SHA1 验签 + AES-256-CBC 解密 + receiveid/corpid 校验;**注意**与 `crypto.py` 的 Fernet 列加密、`wecom.py` 的出站 API 全无关);`service.get_user_by_wecom_userid` 回调反查身份 + `get/set_wecom_chat_task`;`upsert_wecom_binding` 改成合并 config(不再覆盖 chat_task_id);`web/app.py` `GET/POST /v1/wecom/callback`(无 JWT,身份从加密 XML `FromUserName` 反查)。
- env:`WECOM_CALLBACK_TOKEN` / `WECOM_CALLBACK_AESKEY`(企微后台「接收消息」页生成);回调 URL = `<公网 base>/v1/wecom/callback`。**暂只收文本**(图片/语音/文件回 success,后续走 `media/get` 补);未绑定/空消息静默。crypto round-trip 自测过(verify_url / decrypt_message / 坏签名 / 坏 corpid 均符合预期)。
### 2026-06-25 / 修复企业微信扫码绑定报「请在企业微信客户端打开链接」(bump 0.26.10)
- bug:`oauth_authorize_url()` 用的是 `open.weixin.qq.com/connect/oauth2/authorize`(网页授权),这条只能在企业微信客户端内置浏览器里打开;前端 `wecomBind()``window.open` 在**桌面浏览器**新标签打开它 → 企业微信返回「请在企业微信客户端打开链接」,扫不了码。注释里「桌面浏览器=出二维码扫」是误解(那是公众号行为,企微 oauth2/authorize 不出扫码页)。
- 修:换成**扫码授权登录**端点 `login.work.weixin.qq.com/wwlogin/sso/login?login_type=CorpApp&appid=CORPID&agentid=...&redirect_uri=...&state=...` —— 桌面浏览器会渲染二维码,用户用企业微信 App 扫码确认后回跳带 `code`,后续 `verify_state` / `get_user_id(code)` 换 userid 的逻辑完全不动。前置:redirect_uri 域名须在企业微信后台「应用 → 企业微信授权登录 → 可信域名」登记(与「网页授权可信域名」是两项不同设置)。
- 文件:`core/wechat/wecom.py`(`OAUTH_AUTHORIZE`→`WWLOGIN_SSO`、`oauth_authorize_url`)。
### 2026-06-25 / 修复 wechat_push 工具漏挂企业微信(只配企微也能推,bump 0.26.9)
- bug:`wechat_push_available()` 只返回 `service.clawbot_enabled()`,完全没算企业微信。线上若只开了企业微信渠道(ClawBot 开关没开)→ 工具压根没注册到 agent → zcbot 照实回"我没有直接发企业微信的工具"(用户已绑企微仍推不出)。底层 `send_to_user` 其实早支持 `push_wecom`,门槛漏判而已。
- 修:提取 `service.active_channels()` 作渠道清单**唯一真相源** —— `wechat_push_available()` 改成 `bool(active_channels())`、`send_to_user()` 改成 `for ch in active_channels(): _DISPATCH[ch](...)`,门槛与投递同源,加渠道只改一处,根除"两处各列各的"这类漏判。工具描述把「~24h 窗口」注明为 ClawBot-only(企业微信无窗口约束),避免 agent 在企微场景误判窗口限制。纯内部重构,对外契约不变;`test_secret_host_tools` 8/8 过。
- 文件:`tools/wechat_bot.py`、`core/wechat/service.py`。
### 2026-06-25 / 企业微信加「手填 userid」绑定(无域名也能推,bump 0.26.3)
- 痛点:企业微信只有 OAuth 扫码绑定那一路,而 OAuth 回调要落在 HTTPS 可信域名;用户暂无域名 → 卡住。关键认知:**企业微信推送是出站调用(gettoken/message_send 直连 qyapi),根本不需要域名**——只有"扫码拿 userid"那步要域名。
- 加第二条绑定路:`PUT /v1/wecom/bind/userid` 手填成员 userid(管理后台→通讯录→成员→「账号」)→ `upsert_wecom_binding`;前端 rail「微信」modal 企业微信段加输入框 + 保存(与「扫码绑定」并列,已绑回填 userid)。`service`/推送/`send_to_user` 全不动(userid 来源换了,绑定数据结构一样)。
- 文件:`web/app.py`(+1 端点)、`web/static/dev.html`(输入框)、`web/static/js/wechat.js`(保存处理 + 回填)。py 编译 + node --check 过。
### 2026-06-25 / 监控页近 7 天用量按日期倒序(bump 0.26.2)
- `admin.py` `_usage_section``by_day_7d` 排序由 `order_by(day)``order_by(day.desc())`,最新一天在最上(overview 趋势表 + PDF 报告共用此数据,两处都生效)。前端纯按行渲染、不依赖升序,无需改 JS。
### 2026-06-25 / 用户名展示:监控页 + dev 顶栏(bump 0.26.1)
- 统一一条兜底链 `name → user_name → email → uid8`,监控页与 dev 页共用。
- 监控页(`admin.js`):各用户用量 / 存储两表 + overview 迷你表的用户列改走 `userCellHTML`/`userLabelText`,name 与 user_name 都有时主显 name + 浅灰 user_name;`title` 悬浮给完整姓名/账号/邮箱/ID。后端 `admin.py` 两张表 SELECT 补 `User.name/user_name` 回带。
- dev 顶栏(`main.js` `renderWho`):默认显 name,hover(title)显账号/邮箱/ID。`state.js` 加 `userUserName/userEmail` + LS 持久化,抽 `setIdentity`/`userDisplayName`/`userDisplayTitle` 三个 helper,登录(`auth.js`)、embed 签发(`embed.js`)、`/v1/me` 校准(`loadRole`)共用;`login_password` 响应也回带 name/user_name 避免展示闪烁。
### 2026-06-25 / 平台登录注入用户档案 name/user_name(bump 0.26.0)
- 需求:平台作为可信中间层登录时,把用户 `name`(显示名)/ `user_name`(平台账号名)一并注入 zcbot 持久化,供前端展示。
- 实现:`users` 加两列(migration `0016`,纯加 nullable 列,平滑兼容存量行);`LoginRequest` 加可选 `name/user_name`,缺省即旧行为(向后兼容老调用方);`ensure_user_row` 升级为 upsert,`ON CONFLICT DO UPDATE SET x = COALESCE(EXCLUDED.x, users.x)` —— 平台传非空就刷新(同步平台侧改名),传 null/空不覆盖清空,空串归一到 None。
- 暴露:`/v1/auth/login` 响应 + `/v1/me` 回带 `{name, user_name, role}`(新增 `get_user_profile` 单次 SELECT)。机制选 platform 在 login body 推送(零额外往返,与未来 OIDC 的 name/preferred_username claim 注入同构),未选 zcbot 反向拉平台 API。
- 待办:migration `0016` 需在配好 `ZCBOT_DB_URL` 的环境跑 `.venv/Scripts/python.exe main.py db upgrade head` 应用;前端可消费 `/v1/me` 的 name 显示用户名。
### 2026-06-25 / 登录失败提示修正(bump 0.25.2)
- 问题:邮箱密码输错时前端弹「404」(后端 `login_password` 实际返 403「invalid email or password」,前置网关/旧构建把状态改写成 404 后,前端 `doLogin` 直接回显 `r.status + " login failed"` → 用户看到「404 login failed」,语义错误)。
- 修:`web/static/js/auth.js` `doLogin` 失败分支不再回显原始状态码 —— 表单已校验非空,非 2xx 绝大多数是凭据不对,统一给「账号或密码错误」(pw tab)/「user_id 或 PLATFORM_KEY 错误」(key tab);仅 5xx 暴露状态码提示服务端问题。后端 `web/app.py:1399` detail 同步改中文「账号或密码错误」保持契约自洽。
### 2026-06-24 / 微信 task 在 web 端只读镜像(bump 0.25.1)
- 问题:web 端打开 channel=wechat 的常驻 task 能正常发消息,但 web→微信**单向不同步**(web 发消息走 `/v1/tasks/{id}/messages`→`_run_agent_bg`,不经过 inbound loop 里 `send_text` 回微信那段,微信侧零感知);微信→web 则同步(同一条 task)。
- 取舍:不做"双向打通"(受微信 24h `context_token` 窗口约束 → 只能"有时同步",不可预测 + 两入口并发写歧义),改为 web 端**只读镜像**(单一交互权威锚定微信;想主动推走 `wechat_push`/定时简报)。
- `web/static/js/chat.js`:`applyChannelComposerLock(meta)`(selectTask 后调)对 wechat task 置 `chat-input` readOnly + 改 placeholder「请在微信里对话」+ 禁润色;`sendMessage` 入口加 channel 守卫(Enter 兜底)。`dev.html` 加 `.readonly-locked` 置灰样式。
### 2026-06-24 / 微信入站收图片/文件(bump 0.25.0)
- 缺口:`ILinkClient.get_updates` 只抽 `text_item`,图片/文件 item 被丢成空 text → `inbound._poll_binding` 又因空文本 `continue`,用户发的图/文件**静默丢弃、零落库**(DB 实证:caoqianming@foxmail.com 的微信 task 里发的图无任何记录)。
- `core/wechat/ilink.py`:新 `InboundAttachment`(kind/media/file_name/aeskey_hex/data);`get_updates` 解析 `image_item`(type=2)/`file_item`(type=4);新 `download_media()` = CDN `/c2c/download?encrypted_query_param=...` GET 密文 → `_aes_ecb_unpkcs7`(AES-128-ECB 解,发送侧 `_aes_ecb_pkcs7` 的逆);key 两种编码兜底 `_decode_media_aes_key`(base64(raw16) / base64(hex32),后者同发送侧);图片无名按 magic bytes 补扩展名 `_guess_image_ext` + `attachment_basename`(剥路径防穿越)。
- `core/wechat/inbound.py`:`HandleMessage` 契约加第三参 attachments;`_poll_binding` 先下载解密回填 `att.data`,文本/附件**都空才跳过**(单附件下载失败不拖垮整条)。
- `web/app.py:_run_wechat_message`:附件落盘 `<wd>/inbound/<ts>-<i>-<name>`,图片拼 `[用户上传的参考图] <rel>`(agent 自调 `look_at_image` 看图)、文件拼 `[用户上传的文件] <rel>`(agent 用 Read/Shell),**复用 web 端粘贴图同一约定**,不碰模型链路。
- 协议下载分支(GET vs POST、aes_key 取哪支)有真机实测风险:crypto roundtrip + 双编码 key decode 已单测通过;端到端待用户重发一张图验证(原图 cursor 已过)。
### 2026-06-24 / 微信绑定表重构:两表合一 channel_bindings(判别列+JSONB,bump 0.24.3)
- 起因:ClawBot(0012 `wechat_bot_bindings`,8 列)+ 企微(0014 `wecom_bindings`,1 列)各一表。从架构角度复盘:渠道绑定本质="用户在某渠道的一份配置",各渠道字段形态不同 → 最优是**判别列 + JSONB 多态**(与本库 `usage_events` kind+units / `scheduled_jobs.notify` 同范式),加渠道(飞书/TG…)零 migration。分表不扛增长、与库内范式不一致;单宽表(NULL 列并列)最差。
- 重构:`ChannelBinding(user_id, channel, status, config JSONB)` PK=(user_id,channel);clawbot config 装 `{bot_token*, user_im_id, base_url, latest_context_token*, context_token_at, chat_task_id}`(`*` crypto 加密入 JSONB),wecom 装 `{wecom_userid}`。migration `0015` 建表 + 把旧两表数据搬进 config(token 本就是密文串、原样搬)+ drop 旧表;DDL+DML 同事务,失败回滚不丢。
- **关键:只动 models + service 内部 + migration**,`service` 公共 API 与 `BindingSnapshot` 形状不变 → inbound/web/tool/scheduler **零改动**(纯内部数据层重构,对外行为不变)。趁绑定数据极少时合表最省。
- 文件:`core/storage/models.py`(`ChannelBinding` 替 `WeChatBotBinding`/`WeComBinding`)、`core/wechat/service.py`(存取改读写 config)、migration `0015_channel_bindings`(含 down 拆回)。import/编译 + `_snap` 反序列化单测过;DB 往返 + migration 待部署联调。
### 2026-06-24 / 修复微信绑定弹框标题样式错乱(bump 0.24.2)
- 根因:`#wechat-modal h3` 只设了 flex 布局,漏了其他弹框(crons/memory)都有的 `margin:0; padding:12px 16px; font-size:16px; border-bottom` → 标题吃浏览器默认 h3 样式(大字号 + ~21px 上下默认 margin + 无分隔线),看着比别的弹框又大又飘。
- 修复:`web/static/dev.html` 给 `#wechat-modal h3` 补齐标题样式,并加 `h3 svg{opacity:.85}``.sk-x` 关闭按钮样式,与 crons/memory 弹框对齐。
### 2026-06-24 / 修复 host-side 文件工具发不出附件(docker 容器路径未翻译,bump 0.24.1)
- 根因:生产 docker 模式下,fs 工具在容器里跑(文件落容器卷=宿主 `users/<uid>/<wd>/`),但 `send_email` / `wechat_push` 是**宿主进程**工具;它们 `base_dir=Path.cwd()`(部署根)且不识别容器↔宿主路径映射 → agent 给的相对路径拼到 cwd、容器绝对路径 `/workspace/...` 宿主上瞎解析,`relative_to(user_root)` 必越界 → 附件永远发不出(微信 DB 实锤 `#7` 相对 + `#15` 容器绝对两条都「文件路径越界」)。probe 脚本能发是因直接调 `send_file` 绕过解析。
- 修复:`tools/base.py` 加共享 `_resolve_user_file`(`/workspace` 前缀翻回 `user_root` + 相对拼 `base_dir` + 越界校验,抽 `FileOutOfBounds`);`agent_builder` 给两个 host 工具传 `base_dir=working_dir_path`(宿主 task 目录)而非 cwd;`send_email`/`wechat_bot` 改用 helper。host 模式同样受益(相对路径之前也错)。
- 测试:`tests/test_secret_host_tools.py` 加 3 例(helper 翻译+越界、send_email 容器路径附件、wechat_push 相对路径);诊断脚本 `scripts/diag_wechat_push.py`
### 2026-06-24 / 企业微信渠道 B:纯推送 + OAuth 扫码绑定(bump 0.24.0)
- 决策:**企业微信只做推送、不做对话**(用户拍板"和邮箱似的")——省掉入站回调 + AES + 5s ACK + agent 回推一整套;要对话走 ClawBot。企业微信的**无条件主动推**(不挑活跃度、无 24h 窗口)正补 ClawBot 短板,定时简报必达首选。
- 定位 touser:**OAuth 网页授权扫码**拿企业成员 `userid`(用户拍板,优于手填 opaque id)。前提:管理员建自建应用给 `WECOM_CORPID/AGENTID/SECRET` + 配「网页授权可信域名」。
- 文件(后端 import/编译 + 前端 node --check 自测过):`core/wechat/wecom.py`(access_token 2h 缓存+线程安全+失效重取、OAuth getuserinfo、message/send text/file、media/upload、state HMAC 签名);`WeComBinding` 模型 + migration `0014_wecom_bindings`(0013 被 task_channel 占);`service.py` 加 wecom CRUD + `push_wecom` + `send_to_user` 接 wecom 一路;`web/app.py` 5 端点(`/v1/wecom/oauth/url`、`/v1/wecom/oauth/callback` 公开-身份从 state 验、`/v1/wecom/bind` GET/DELETE、`/v1/wecom/test`);前端 rail「微信」modal 加企业微信段(`wechat.js` + dev.html)。
- env:`WECOM_CORPID/AGENTID/SECRET` + 可选 `ZCBOT_PUBLIC_BASE_URL`(OAuth redirect 主机,须在可信域名内)。**待办**:管理员就绪后端到端验(扫码绑 → test → 简报推);**回调端点须公开**(已不挂 require_user)且 redirect 主机匹配可信域名。
### 2026-06-24 / 配置 QQ/foxmail SMTP 发信 + 发件人显示名品牌化(bump 0.23.2)
- `.env` 填入 foxmail SMTP(smtp.qq.com:25 / STARTTLS / 授权码),`send_email` tool 与定时任务 notify 兜底投递就此生效;自检发信链路通过。
- `tools/send_email.py` 发件人显示名从硬编码 `zcbot` 改为读 `SMTP_FROM_NAME`,默认「总院科研辅助智能体」—— 对外不暴露内部代号。RUN.md env 段补 `SMTP_FROM_NAME`
### 2026-06-24 / 微信任务徽章改品牌绿 + 微信 logo + 整行绿边(bump 0.23.1)
- 上一版徽章复用 `.badge.active`(蓝灰),与旁边「进行中」状态徽章撞色、不显眼。
- 新增 `.badge.wx`(微信品牌绿 `#07C160` + 白字 + 内嵌微信 logo SVG)与 `.task-row.wx`(绿色左边框 + 极淡绿底 + hover 加深),让置顶的微信任务从普通任务里跳出来。文件:`web/static/dev.html`(CSS)、`web/static/js/chat.js`(`WECHAT_ICON` 常量 + badge/row class)。
### 2026-06-24 / 微信对话 task 渠道标记 + 置顶(bump 0.23.0)
- 痛点:微信常驻 task 与网页常规 task 结构相同,只能靠 description 魔法值反推;且 `created_at` 固定后随用户开新 task 越沉越深,这个「渠道收件箱」反而最难找。
- `tasks``channel` 列(`web`/`wechat`,migration 0013,`server_default='web'` 回填存量、并把 description=`(微信 ClawBot 对话)` 的存量 task backfill 成 `wechat`)。`ensure_local_task_row` 加 `channel` 参数,微信建 task 处传 `wechat`;`channel` 仅 INSERT 写定,后续 upsert/save 不传 → 不覆盖。
- `_task_dict` 透出 `channel`;列表查询排序前置 `case((channel=='wechat',0),else_=1)` pin 表达式 → 微信 task 后端强制置顶(跨分页稳定),用户选的排序对其余 task 照常生效。
- 前端 `chat.js` 任务名前打绿色「微信」徽章(`channel==='wechat'`)。文件:`core/storage/models.py`、`core/storage/utils.py`、`web/app.py`、`web/static/js/chat.js`、`db/migrations/versions/...0013_task_channel.py`。
### 2026-06-24 / 微信绑定 UI 并入主 SPA(bump 0.22.2)
- 上一版绑定页是独立 `/static/wechat_bind.html`,主界面没入口、用户找不到。
- 集成:左栏 rail 加「微信」按钮(`hd-wechat`)→ 扫码绑定 modal(`wechat-modal`),复用 `api()` 调已有 5 端点(起码/轮询/查/解绑/自检),仿 `crons.js` modal 范式;过期自动换码、绑定成功提示去微信开口。文件:`web/static/js/wechat.js`(新)、`web/static/dev.html`(rail 按钮 + modal + CSS)、`web/static/js/main.js`(import 触发绑定 + Esc 关闭)。
- 独立页 `web/static/wechat_bind.html` 保留作嵌入/兜底入口(同套端点)。
### 2026-06-24 / 修复顶栏 token 计量栏回复后不刷新(bump 0.22.1)
- 现象:提问→助手答完后,对话顶栏的「总 token · 缓存命中 · 花费」计量栏停在发问前旧值,要切到别的 task 再切回才更新。
- 根因:计量栏由 `renderChatMeta()``state.taskMeta` 渲染,而 `state.taskMeta` 只在 `selectTask``GET /v1/tasks/{id}` 时刷新。SSE 流结束后 `fetchSse` 的 finally 只 `loadTaskList()`(左栏列表)+ `loadMessages()`,从未重拉 meta 也没调 `renderChatMeta`——SSE 期间用量只累计进 hint,没落 taskMeta。
- 修:`fetchSse` finally 块里,当收尾的是当前可见 task 时补一次 `GET /v1/tasks/{id}` → 重置 `state.taskMeta``renderChatMeta()`;失败 try/catch 吞掉不打断收尾。`web/static/js/chat.js`。
### 2026-06-24 / 微信接入第一期:ClawBot 个人微信(后端完成,bump 0.22.0)
- 需求:把 zcbot 送进用户**个人微信**——能对话、能推简报/结果。调研三条路:wechaty/hook(违规高封号,排除)、企业微信自建应用(官方但要管理员+仅企业成员)、**微信 ClawBot**(腾讯 2026-03 官方个人号 Bot API,iLink 协议,零封号,后端接谁都行)。选 ClawBot 先行。详 DESIGN §8.7。
- **协议全程真机实测**(`scripts/probe_clawbot*.py`,本人微信号在灰度内):① 扫码绑定拿 `bot_token`;② `getupdates` 长轮询收消息;③ `sendmessage` **每条 `client_id` 必唯一**(漏则同 token 后续被丢——前几轮误判"纯被动"的真因),多条/长文中间块 `state=1` 末块 `state=2`;④ `context_token` 24h 可复用 → **主动推送成立**(需用户先开口一次);⑤ 文件:`getuploadurl`→AES-128-ECB(PKCS7)→CDN(URL 带 `filekey`,漏则 400 mismatch)→`file_item`,docx/pdf 原生直推。
- **关键设计决策**:入站对话→每用户一条 persistent「微信」task(连续性,token 靠 §8.2 压缩);凭据(bot_token/context_token)加密列(env `ZCBOT_WECHAT_SECRET_KEY`),绝不进沙箱/日志;**入站出站一体**——主动推送依赖入站给的 context_token,故 getupdates 长轮询常驻(既收对话又刷新 24h 窗口)。
- **文件**(后端全部 import/编译自测过):`core/wechat/{ilink.py 协议客户端, crypto.py 凭据加密, service.py 绑定CRUD+推送+send_to_user 渠道抽象, inbound.py 长轮询管理器+回复提取}`;`core/storage/models.py` 加 `WeChatBotBinding` + migration `0012_wechat_bot_bindings`;`tools/wechat_bot.py` `WechatPushTool` + `core/agent_builder.py` 注册(有开关才挂);`core/scheduler.py` `deliver_notify``wechat` 通道(未送达退邮件兜底);`web/app.py` lifespan 起入站管理器 + `_run_wechat_message` 回调 + 5 端点(`/v1/wechat/bind/qrcode|status`、`/v1/wechat/bind` GET/DELETE、`/v1/wechat/test`);`web/static/wechat_bind.html` 自包含绑定页;`requirements.txt` 加 segno+cryptography。
- **env**:`ZCBOT_WECHAT_BOT_ENABLED=1`(渠道开关)+ `ZCBOT_WECHAT_SECRET_KEY=<串>`(凭据加密,缺则退明文标记)+ 可选 `ZCBOT_WECHAT_BASE_URL`
- **待办(部署后联调)**:migration `0012` 上库;起 web 进程端到端验(扫码绑定→对话→主动推→定时简报推);**渠道 B 企业微信**(无条件推送,补 ClawBot 24h 窗口短板)按 §8.7「渠道 B」实现。SPA 集成已落(见下条)。
### 2026-06-23 / 平台渲染层 rendering/:三 skill docx 统一 + chromium md→pdf(bump 0.21.0)
- 背景:线上 `简报` task 用户要"输出为pdf",模型因 brief 无 PDF 路径而临场即兴——试 `apt install libreoffice`(只读 fs 失败)→ `pip install weasyprint markdown` 手搓 md→HTML→weasyprint;容器空闲回收后包不持久,二次导出又重装一遍。深挖发现两个问题:① skill 缺 PDF 路径、weasyprint 不在镜像;② `_CHEM_RE` 化学式白名单在 brief/paper/proposal **三份 render_docx.py 逐字重复**(改一处易漏改),patent/standard 还复用 proposal 那份。
- 架构判断:**渲染不是 skill 内容,是平台能力**(像 chromium/document_search)。Skills 走 Anthropic 自包含/可 fork bundle 标准,把共享渲染库塞 `skills/_shared` 让各 skill `import` 会破坏 fork。故新建**顶层 `rendering/` 平台包**,bind-mount 进 `/sandbox/rendering`(pool.py,与 skills 同款 ro),各 skill 调 `render.py` 不再自带 render 脚本。
- `rendering/`:`common.py`(叶子原语单一事实源:字体/`CHEM_RE`/块级正则/表格行/图片路径)+ `docx_manuscript.py`(paper/proposal 配置化双 profile)+ `docx_brief.py`(brief 富渲染,复用 common)+ `pdf.py`(md→HTML→chromium `--print-to-pdf`,复用 `common.CHEM_RE`)+ `render.py`(统一 CLI `--profile {brief,paper,proposal} --format {docx,pdf}`,sys.path bootstrap 让 `python /sandbox/rendering/render.py` 直调可解析)。
- **零回归证明**:重构前后对三 profile 各渲 docx、解包 diff `word/document.xml`,brief/paper/proposal **全部字节一致**(12962/10755/11401 bytes)。纯搬移+共享原语,输出不变。
- chromium md→pdf:不用 weasyprint(要 pango/cairo、不在仓库 Dockerfile);chromium 镜像已装(给 mermaid)+ fonts-noto-cjk 已装,完整内核 CSS 保真度更高。固定 `--no-sandbox --disable-dev-shm-usage --user-data-dir=/tmp/* --no-pdf-header-footer`。冒烟 `deploy/sandbox/probe_chromium_pdf.sh`(照 probe_mermaid.sh):最小 chromium 镜像在 `--read-only --cap-drop=ALL` + 64MB `/dev/shm` 下实测出图,中文/下标/DOI 超链/表格/callout 全绿、页眉已关。
- 删:`skills/{brief,paper,proposal}/scripts/render_docx.py`(3 份)+ 短命的 `skills/_shared/render_pdf.py`。改 5 个 SKILL.md(brief/paper/proposal 直接调,patent/standard 复用 proposal profile)调用到 render.py + 补反模式"渲染一律调 render.py、禁止手搓"。`requirements.txt` 加 `markdown`
- **部署要点**:`/sandbox/rendering` 挂载靠 pool.py(restart 重建容器才生效)+ `markdown` 进镜像靠 requirements 变更触发的整体重建 —— **需一次 deploy(update.sh)原子激活**,旧 render_docx 路径已删,deploy 前别只推 SKILL 改动。引文 `[n]` 上标回链 pdf 仍按字面渲(docx 有,pdf 后补)。
- 文件:`rendering/{__init__,common,docx_manuscript,docx_brief,pdf,render}.py`(新)、`core/sandbox/pool.py`(+rendering 挂载)、`deploy/sandbox/probe_chromium_pdf.sh`(新)、`requirements.txt`、5×`SKILL.md`、`skills/brief/SKILL.md`(另删 research 索引滞后描述)、`core/__init__.py` 0.20.4→0.21.0。
### 2026-06-23 / 消息目录定位错位修复(bump 0.20.4)
- 现象:点右侧圆点轨道**第一个**圆点,活跃高亮常落到**第二个**。根因是两套锚点不一致——`jumpToMessage` 用 `block:"center"` 居中,但第一轮上方无内容无法居中、被钉到顶端;而 `updateActiveOutlineDot` 按「顶线 80px 容差」判活跃轮,第一轮短时下一轮卡片顶也落进 80px 带内 → 越界高亮第二个圆点(滚动监听又覆盖了 jumpToMessage 的显式 setActiveOutlineIdx)。
- 修复:跳转改 `block:"start"`(顶部对齐,与活跃判定同锚点)+ `.msg``scroll-margin-top:16px` 留呼吸;活跃容差 80→24 与之对齐,贴顶短轮判到自己不越界。
- 文件:`web/static/js/chat.js`(`jumpToMessage` / `updateActiveOutlineDot`)、`web/static/dev.html`(`.msg` CSS);`core/__init__.py` 0.20.3→0.20.4。
### 2026-06-22 / 前端两处 bug 修复(bump 0.20.3)
- 定时弹窗"被遮挡":`#crons-modal` 漏了 z-index,退回基础 `.modal`(无 z-index)被 z-index:5 的侧栏/面板盖住;补 `z-index: 112` 与兄弟只读 modal(`#skills-modal`/`#memory-modal`)对齐。排查用 node 加 DOM mock 跑通整条前端模块图,确认 `hd-crons` 绑定确实执行(排除了"按钮没绑事件"),定位到纯 CSS 层叠问题。
- 登录页 focus 引用错 id:`web/static/js/main.js:106` `$("li-token").focus()``li-token` 不存在(登录输入框实际是 `li-email`),未登录 boot 末尾会抛 TypeError;改为 `li-email`
- 文件:`web/static/dev.html`、`web/static/js/main.js`;`core/__init__.py` 0.20.2→0.20.3。
### 2026-06-21 / 发送期修复悬空 tool_calls(bump 0.20.2)
- 根因(监控页 error 任务排查,task 5c5d6d25 DB 实测):run 在写入 `assistant.tool_calls` 之后、tool 结果写库之前被中断(上游流式断连 / 用户取消 / 崩溃),历史里留下一条 `assistant.tool_calls` 后面**没有对应 tool 结果**的消息;用户随后继续发言,下一轮把历史原样发给 DeepSeek/OpenAI 即被拒 `An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'` → 任务进 `run_status=error` 卡死。区别于 06-06/06-12 的 arguments 损坏/投毒修复(那治"参数被压成 marker"),这是**结构性悬空**,旧修复不覆盖。
- 修复(方案 A,发送期兜底):`core/context.py` 新增 `_repair_dangling_tool_calls`,在 `prepare_messages_with_stats` 入口(早返回分支之前)对每条 `assistant.tool_calls` 扫描紧随其后的连续 tool 结果,为**缺失**的 `tool_call_id` 补一条占位 tool 消息(`[interrupted: ...]`,带原 function name)。纯发送期、不改库 → 覆盖所有中断路径 + 已存在的坏数据自愈(下次发消息即修复),`stats.repaired_tool_calls` 计数。选 A 而非写入期防御(方案 B):B 要覆盖所有中断路径易漏且救不了存量。
- 验证:真实坏 task 5c5d6d25 修复前 idx 19 悬空 1 条 → 修复后 0 悬空、协议合法(压缩开/跳过两分支均覆盖);新增 4 个单测,context 套件 14 项全过。
- 文件:`core/context.py`、`tests/test_context_compaction.py`;`core/__init__.py` 0.20.1→0.20.2。
### 2026-06-18 / brief 简报重定位「重要文献速览」+ 精简三文件(bump 0.20.0) ### 2026-06-18 / brief 简报重定位「重要文献速览」+ 精简三文件(bump 0.20.0)
- 需求漂移收敛:brief 从"热点聚类趋势判断型简报"重定位为**「重要论文列表 + 内容总结」速览型** —— ①只描述不给建议(去掉启示/判断/空白争议);②开头一份重要期刊论文列表(各大相关刊、**Elsevier 数据库优先**),每篇带一段简介/摘要概述;③对这批论文做客观总结即可。 - 需求漂移收敛:brief 从"热点聚类趋势判断型简报"重定位为**「重要论文列表 + 内容总结」速览型** —— ①只描述不给建议(去掉启示/判断/空白争议);②开头一份重要期刊论文列表(各大相关刊、**Elsevier 数据库优先**),每篇带一段简介/摘要概述;③对这批论文做客观总结即可。

33
RUN.md
View File

@ -14,8 +14,11 @@
DEEPSEEK_API_KEY=sk-... DEEPSEEK_API_KEY=sk-...
# 用 GLM 的话再加一条;国际站 z.ai 用 ZAI_API_KEY,国内站 bigmodel.cn 用 ZHIPUAI_API_KEY(对应 config/models/glm.yaml 的 api_key_env 字段) # 用 GLM 的话再加一条;国际站 z.ai 用 ZAI_API_KEY,国内站 bigmodel.cn 用 ZHIPUAI_API_KEY(对应 config/models/glm.yaml 的 api_key_env 字段)
ZHIPUAI_API_KEY=... ZHIPUAI_API_KEY=...
# 豆包(火山方舟)图像/视频生成:可选。设了同时挂 seedream tool(0.22 元/张)与 seedance tool # 豆包(火山方舟)统一 key,三处共用:可选。
# (Seedance 2.0 Fast,文生视频,480p 4s ¥1.86 ~ 720p 15s ¥12+,异步等 30-90s);未设两个 tool 都不出现 # 1) 文本/Agent 模型 config/models/doubao.yaml(Seed 2.1 turbo/pro、自进化 evolving)—— 走 Ark OpenAI 兼容端点
# 2) 图像生成 seedream tool(0.22 元/张)
# 3) 视频生成 seedance tool(Seedance 2.0 Fast,文生视频,480p 4s ¥1.86 ~ 720p 15s ¥12+,异步等 30-90s)
# 未设:豆包文本模型选不了,seedream/seedance 两个 tool 都不出现
ARK_API_KEY=... ARK_API_KEY=...
# documents skill(内部知识库 document_search API):可选。设了后注册 # documents skill(内部知识库 document_search API):可选。设了后注册
# document_list_kb / document_search / document_download 三个 host-side tool; # document_list_kb / document_search / document_download 三个 host-side tool;
@ -51,13 +54,34 @@
# SMTP_USER=you@qq.com # SMTP_USER=you@qq.com
# SMTP_PASSWORD=<授权码/应用专用密码,非登录密码> # SMTP_PASSWORD=<授权码/应用专用密码,非登录密码>
# SMTP_FROM=you@qq.com # 可选,默认取 SMTP_USER # SMTP_FROM=you@qq.com # 可选,默认取 SMTP_USER
# SMTP_FROM_NAME=总院科研辅助智能体 # 可选,发件人显示名,默"总院科研辅助智能体"(不暴露内部代号)
# 定时任务守护循环(DESIGN §8.5,随 web 进程起,plain-asyncio 仿 _disk_scanner): # 定时任务守护循环(DESIGN §8.5,随 web 进程起,plain-asyncio 仿 _disk_scanner):
# ZCBOT_DISABLE_SCHEDULER=1 # 可选,整体关掉调度(对照 Claude Code CLAUDE_CODE_DISABLE_CRON) # ZCBOT_DISABLE_SCHEDULER=1 # 可选,整体关掉调度(对照 Claude Code CLAUDE_CODE_DISABLE_CRON)
# ZCBOT_SCHEDULER_TICK_SECONDS=10 # 可选,扫描间隔,默 10s(只决定最坏延迟≤1tick,不影响会否漏) # ZCBOT_SCHEDULER_TICK_SECONDS=10 # 可选,扫描间隔,默 10s(只决定最坏延迟≤1tick,不影响会否漏)
# ZCBOT_SCHEDULER_CONCURRENCY=4 # 可选,并发跑的定时 run 上限,默 4 # ZCBOT_SCHEDULER_CONCURRENCY=4 # 可选,并发跑的定时 run 上限,默 4
# 微信接入(ClawBot 个人微信,DESIGN §8.7):可选。开关在才挂 wechat_push tool + 起入站长轮询。
# ZCBOT_WECHAT_BOT_ENABLED=1 # 渠道总开关;开启后 lifespan 起入站管理器,用户可扫码绑定
# ZCBOT_WECHAT_SECRET_KEY=<随机串> # 凭据(bot_token/context_token)列加密密钥;缺则退明文标记(公测兜底)
# ZCBOT_WECHAT_BASE_URL=... # 可选,覆盖 iLink base(默 https://ilinkai.weixin.qq.com)
# 企业微信(渠道 B,出站推送 + 入站对话,§8.7):三件套齐才挂推送。无条件主动推,补 ClawBot 24h 窗口短板。
# WECOM_CORPID=ww... # 企业 ID(管理员:我的企业→企业信息)
# WECOM_AGENTID=1000002 # 自建应用 AgentId
# WECOM_SECRET=... # 自建应用 Secret
# ZCBOT_PUBLIC_BASE_URL=https://zcbot.example.com # 可选,OAuth 回调主机(须在应用「企业微信授权登录」可信域名内;缺则取请求 base)
# 入站对话(可选,要公网 HTTPS):企微后台「应用→接收消息→设置 API 接收」填回调 URL + 下面两项,
# 用户即可在企业微信里直接和 zcbot 对话(回调 URL = <公网 base>/v1/wecom/callback)。
# WECOM_CALLBACK_TOKEN=... # 接收消息 Token(企微后台生成)
# WECOM_CALLBACK_AESKEY=... # EncodingAESKey(43 字符,企微后台生成)
``` ```
> litellm 在 import 时副作用加载 .env;入口走 `main.py`,`.env` 自动生效。直跑 `python -c "from core.storage import ..."` 不经 litellm 链路时记得自己 `import litellm` 触发,或手动 `export ZCBOT_DB_URL=...` > litellm 在 import 时副作用加载 .env;入口走 `main.py`,`.env` 自动生效。直跑 `python -c "from core.storage import ..."` 不经 litellm 链路时记得自己 `import litellm` 触发,或手动 `export ZCBOT_DB_URL=...`
- **依赖**:`pip install -r requirements.txt`(已在 `.venv` 里;含 `bcrypt`)。 - **依赖**:`pip install -r requirements.txt`(已在 `.venv` 里;含 `bcrypt`、`segno`、`cryptography`)。
- **微信接入(ClawBot,§8.7)**:① `main.py db upgrade head` 带上 migration `0012`;② `.env``ZCBOT_WECHAT_BOT_ENABLED=1` + `ZCBOT_WECHAT_SECRET_KEY=<串>`;③ 用户登录后点**左栏 rail「微信」按钮**(`/static/wechat_bind.html` 仍保留作独立/嵌入入口)扫码绑定(需个人微信 8.0.70+ 且灰度到 ClawBot 插件)。绑定后在微信「微信 ClawBot」对话即走 zcbot;**主动推送需用户近 24h 在微信开口过一次**(冷启动/超期推不出,退邮件兜底)。
- **企业微信(渠道 B,纯推送,§8.7)**:① 管理员建自建应用 → 填 `WECOM_CORPID/AGENTID/SECRET`(+ 可见范围含目标用户);② `main.py db upgrade head`。**绑定两条路,任选**:
- **手填 userid(无域名时,最省)**:rail「微信」modal 企业微信段填成员 userid(管理后台→通讯录→点成员→「账号」)→ 保存。**推送是出站调用,不需要域名/HTTPS**,这条最省事。
- **扫码授权登录(要 HTTPS 域名)**:管理员在应用→**「企业微信授权登录」**里把 zcbot 域名配进可信域名(注意不是「网页授权可信域名」,是另一项)+ 设 `ZCBOT_PUBLIC_BASE_URL`;用户点「扫码绑定」→ 桌面浏览器出二维码 → 企业微信 App 扫码确认。回调 `/v1/wecom/oauth/callback` 公开(身份从 HMAC state 验)。链接走 `login.work.weixin.qq.com/wwlogin/sso/login`(不是网页授权 `oauth2/authorize`,后者只能在企微客户端内打开 → 桌面浏览器会报「请在企业微信客户端打开链接」)。
- 绑定后简报/结果**无条件主动推**(不挑活跃度、无 24h 窗口),适合必达。
- **入站对话(可选,要公网 HTTPS)**:企微后台「应用 → 接收消息 → 设置 API 接收」填回调 URL `<公网 base>/v1/wecom/callback` + 自动生成的 Token / EncodingAESKey → 写进 env `WECOM_CALLBACK_TOKEN` / `WECOM_CALLBACK_AESKEY` → 保存时企微 GET 验 URL(`/v1/wecom/callback` GET 自动回 echostr)。配好后用户在企业微信里直接给应用发消息即走 zcbot 对话(与个人微信各一张会话上下文)。agent 跑完走 message/send 主动推回(非被动同步,故无 5s 限制)。**支持文本 + 图片 + 文件**(图片/文件走 media/get 下载,落盘进会话目录 inbound/);语音/视频/位置等暂不处理;未绑定/空消息静默。
- **channel 长会话上下文(微信/企业微信通用,0019)**:常驻会话不再无限膨胀。① **自动分段**——入站时距上次消息超过 `config.json``channel.session_gap_hours`(默 **6** 小时,设 `<=0` 关闭)→ 软重置:只把「最后一条 user 消息起」喂模型(保留上一轮做续聊锚点),之前的历史仍全留 DB,网页端照旧翻完整记录;② **手动新话题**——用户在微信/企业微信里直接发「新话题 / 新会话 / `/new` / 清空上下文」→ 硬重置,彻底从零(回执提示已归档)。两者都**不删任何消息**,只移动「喂给模型的窗口起点」`tasks.context_base_idx`。网页端「清空对话」(`POST /v1/tasks/{id}/clear`)仍整清并把 base 归 0。需 `main.py db upgrade head` 带上 `0019`
- **PG**:`ZCBOT_DB_URL` 必填。本地 docker compose / 远端 dev / 生产任选;未设置时启动清晰报错,不引导 docker(§7.4)。 - **PG**:`ZCBOT_DB_URL` 必填。本地 docker compose / 远端 dev / 生产任选;未设置时启动清晰报错,不引导 docker(§7.4)。
- **Auth env**:`PLATFORM_KEY` + `JWT_SECRET` 任一缺失 web 启动 fail-fast。生成随机串:`python -c "import secrets; print(secrets.token_urlsafe(48))"`。 - **Auth env**:`PLATFORM_KEY` + `JWT_SECRET` 任一缺失 web 启动 fail-fast。生成随机串:`python -c "import secrets; print(secrets.token_urlsafe(48))"`。
- **用户管理**(`users.email/password_hash/role`,0005 UNIQUE(email)、0009 role):dev SPA 登录后端。发用户两条路径任选:CLI `main.py user add`(下方),或在登录页右下角"+ 管理员添加用户"链接(需先设 `ZCBOT_ADMIN_TOKEN` env,弹窗输入 email/密码/管理员口令/角色)。撤用户 `DELETE FROM users WHERE email=...`(先 DELETE 该 user 的 tasks)。**用户自助改密**:登录后顶栏「改密码」按钮(走 `POST /v1/auth/change_password`,需知道旧密码);改邮箱 / 用户忘了旧密码无法自助 → 手动 SQL(见故障兜底)。 - **用户管理**(`users.email/password_hash/role`,0005 UNIQUE(email)、0009 role):dev SPA 登录后端。发用户两条路径任选:CLI `main.py user add`(下方),或在登录页右下角"+ 管理员添加用户"链接(需先设 `ZCBOT_ADMIN_TOKEN` env,弹窗输入 email/密码/管理员口令/角色)。撤用户 `DELETE FROM users WHERE email=...`(先 DELETE 该 user 的 tasks)。**用户自助改密**:登录后顶栏「改密码」按钮(走 `POST /v1/auth/change_password`,需知道旧密码);改邮箱 / 用户忘了旧密码无法自助 → 手动 SQL(见故障兜底)。
@ -301,6 +325,7 @@ sudo bash /opt/zcbot/deploy/update.sh
脚本顺序写死:`git pull --ff-only` → `pip install -r``db upgrade head`**`docker build` sandbox 镜像** → **`systemctl restart zcbot`** → `curl /healthz` 验活。要点: 脚本顺序写死:`git pull --ff-only` → `pip install -r``db upgrade head`**`docker build` sandbox 镜像** → **`systemctl restart zcbot`** → `curl /healthz` 验活。要点:
- **build 必须在 restart 之前**:sandbox 容器 per-user 长驻 + 复用,`tools/` 是 build 进镜像的(非 mount)。restart 时 `shutdown_all` 清旧容器,下次 `ensure()` 才用新 `zcbot-sandbox:latest` 重建 —— 顺序反了新 tools/ 要等下次重启才生效。 - **build 必须在 restart 之前**:sandbox 容器 per-user 长驻 + 复用,`tools/` 是 build 进镜像的(非 mount)。restart 时 `shutdown_all` 清旧容器,下次 `ensure()` 才用新 `zcbot-sandbox:latest` 重建 —— 顺序反了新 tools/ 要等下次重启才生效。
- **平台渲染层 `rendering/`(2026-06-23 起)**:各 skill 出 docx/pdf 调 `python /sandbox/rendering/render.py --profile {brief,paper,proposal} --format {docx,pdf}`(不再各自带 render_docx.py)。`rendering/` 随 `pool.py` **bind-mount 进 `/sandbox/rendering`**(restart 重建容器才挂上),pdf 依赖 `markdown`(已进 requirements,镜像重建才内置)+ 镜像自带 chromium。**这次升级要整体重建镜像 + restart 一并 deploy**——旧 render_docx 路径已删,只推代码不重建会让 brief/paper/proposal/patent/standard 渲染失败。沙盒 chromium 渲 pdf 的冒烟探针:`deploy/sandbox/probe_chromium_pdf.sh`(服务器上跑,用法见脚本头)。
- **sandbox build 每次都跑没关系**:layer cache 让重活(pip ~1G / chromium / 字体 / mermaid,都在 `COPY tools/` 之上)在改代码部署时秒过;只有 `requirements.txt` 变了才整体重建(~5-10min,正好也是该重建的时候)。host backend 机器 / 临时不想动 docker:`sudo bash deploy/update.sh --skip-build`。 - **sandbox build 每次都跑没关系**:layer cache 让重活(pip ~1G / chromium / 字体 / mermaid,都在 `COPY tools/` 之上)在改代码部署时秒过;只有 `requirements.txt` 变了才整体重建(~5-10min,正好也是该重建的时候)。host backend 机器 / 临时不想动 docker:`sudo bash deploy/update.sh --skip-build`。
- **镜像源默认:pip+apt 清华、npm 腾讯**(`PIP_INDEX_URL=pypi.tuna.tsinghua.edu.cn/simple/` / `APT_MIRROR=mirrors.tuna.tsinghua.edu.cn` / `NPM_REGISTRY=mirrors.cloud.tencent.com/npm/`)。pip 选清华是因为**腾讯 PyPI 曾返回损坏的 litellm wheel**(index hash 对、文件字节不对 → pip `DO NOT MATCH THE HASHES`),且**阿里 PyPI 又一度滞后**(litellm 只到 1.82.6,卡死 `>=1.83.0`);清华境内稳 + 同步及时。npm 用腾讯是因为**清华不提供 npm registry**、npmmirror 访问不稳,腾讯 npm 历来 OK(坏 wheel 只是腾讯 PyPI 的事,npm 不受影响;备选华为 / USTC npm 源)。要命中 docker cache 就别多组源来回换(换源从 pip 层炸开全重跑)。想用官方源:`PIP_INDEX_URL= sudo -E bash deploy/update.sh`(置空即回落 Dockerfile 官方默认)。host venv 的 step 2 pip 也吃这个源(脚本显式 `--index-url`,不靠 host pip.conf)。 - **镜像源默认:pip+apt 清华、npm 腾讯**(`PIP_INDEX_URL=pypi.tuna.tsinghua.edu.cn/simple/` / `APT_MIRROR=mirrors.tuna.tsinghua.edu.cn` / `NPM_REGISTRY=mirrors.cloud.tencent.com/npm/`)。pip 选清华是因为**腾讯 PyPI 曾返回损坏的 litellm wheel**(index hash 对、文件字节不对 → pip `DO NOT MATCH THE HASHES`),且**阿里 PyPI 又一度滞后**(litellm 只到 1.82.6,卡死 `>=1.83.0`);清华境内稳 + 同步及时。npm 用腾讯是因为**清华不提供 npm registry**、npmmirror 访问不稳,腾讯 npm 历来 OK(坏 wheel 只是腾讯 PyPI 的事,npm 不受影响;备选华为 / USTC npm 源)。要命中 docker cache 就别多组源来回换(换源从 pip 层炸开全重跑)。想用官方源:`PIP_INDEX_URL= sudo -E bash deploy/update.sh`(置空即回落 Dockerfile 官方默认)。host venv 的 step 2 pip 也吃这个源(脚本显式 `--index-url`,不靠 host pip.conf)。
- **进度可见**:step 2 pip 不带 `-q`,部署时能看到装包进度;step 4 docker build 走默认 TTY 进度 UI(分层折叠刷新,直观)。 - **进度可见**:step 2 pip 不带 `-q`,部署时能看到装包进度;step 4 docker build 走默认 TTY 进度 UI(分层折叠刷新,直观)。
@ -731,6 +756,7 @@ sudo xfs_quota -x -c "limit -p bhard=10g zcbot_<user_uuid>" /opt
| `kill -HUP <pid>``/openapi.json` 没新接口 | uvicorn **不响应 SIGHUP**(没装 handler,落 Python 默认终止;Windows 上信号本身无效)。Ubuntu 上用 `systemctl restart zcbot`,或 unit 加 `--reload` 让 uvicorn 监听文件自动重起(见"部署"段)。验证:`curl -s http://127.0.0.1:8765/openapi.json \| python3 -c 'import sys,json;print([p for p in json.load(sys.stdin)["paths"] if "auth" in p])'` | | `kill -HUP <pid>``/openapi.json` 没新接口 | uvicorn **不响应 SIGHUP**(没装 handler,落 Python 默认终止;Windows 上信号本身无效)。Ubuntu 上用 `systemctl restart zcbot`,或 unit 加 `--reload` 让 uvicorn 监听文件自动重起(见"部署"段)。验证:`curl -s http://127.0.0.1:8765/openapi.json \| python3 -c 'import sys,json;print([p for p in json.load(sys.stdin)["paths"] if "auth" in p])'` |
| `systemctl restart zcbot` 要等几十秒才退 | 正常 —— 优雅 drain 在等在跑的 run 收尾(`shutdown.drain_timeout` 默 30s),没在跑 run 时秒退。journal 出现 `[shutdown] draining N in-flight run(s)` 即正常。真急(不在乎杀掉在跑 run):`systemctl kill -s KILL zcbot` | | `systemctl restart zcbot` 要等几十秒才退 | 正常 —— 优雅 drain 在等在跑的 run 收尾(`shutdown.drain_timeout` 默 30s),没在跑 run 时秒退。journal 出现 `[shutdown] draining N in-flight run(s)` 即正常。真急(不在乎杀掉在跑 run):`systemctl kill -s KILL zcbot` |
| 部署后在跑的对话被标 `error: server restarted before run finished` | 该 run 在 drain 期内没收尾、cancel 也没在 `cancel_grace` 内退,被 SIGKILL 后下次启动 reaper 标的。多半是 run 卡在不 poll cancel 的长动作(如单次超长 docker exec)或 `TimeoutStopSec` 配得比 drain 预算还小被提前 SIGKILL。先核对 unit `TimeoutStopSec > drain_timeout + cancel_grace`;真有超长 run 把 `drain_timeout` 调大 | | 部署后在跑的对话被标 `error: server restarted before run finished` | 该 run 在 drain 期内没收尾、cancel 也没在 `cancel_grace` 内退,被 SIGKILL 后下次启动 reaper 标的。多半是 run 卡在不 poll cancel 的长动作(如单次超长 docker exec)或 `TimeoutStopSec` 配得比 drain 预算还小被提前 SIGKILL。先核对 unit `TimeoutStopSec > drain_timeout + cancel_grace`;真有超长 run 把 `drain_timeout` 调大 |
| 定时任务「跑到一半没推送」/ crons 页显示「上次失败: 运行超过超时上限 Ns 未完成」 | job 跑满 `timeout_seconds` 被协作式中断(还没写完 / 没推送)。**0.32.2 起超时记 error**(此前误记 ok 看不出来),计入连续失败、到阈值自动停用。**0.32.4 起新建 job 默认超时 1800s**(此前默认 0=不限;`DEFAULT_TIMEOUT_SECONDS`),`0` 仍可显式设"不限"。处置:报告类重活(多刊检索+渲 docx)若仍不够,把该 job `timeout_seconds` 再调大或设 0;被自动停用的重新 enable。诊断单个 job 用 `scripts/diag_sched_e621.py <job_id 前缀>` |
| `POST /v1/files/rename` 返 409 `folder has active run(s)` | 顶层目录被某 running/cancelling 的 task 占用;先 cancel 等流式 done 再 rename | | `POST /v1/files/rename` 返 409 `folder has active run(s)` | 顶层目录被某 running/cancelling 的 task 占用;先 cancel 等流式 done 再 rename |
| `POST /v1/files/rename` 返 409 `... 前缀嵌套` | 改名后会与其他 task 的 working_dir 形成嵌套;换不冲突的 new_name | | `POST /v1/files/rename` 返 409 `... 前缀嵌套` | 改名后会与其他 task 的 working_dir 形成嵌套;换不冲突的 new_name |
| `POST /v1/files/upload` 返 413 `已达磁盘配额上限` | per-user 5GB(yaml `quotas.disk_bytes_per_user`)。让用户在 dev SPA 右侧文件栏删旧产物 / 大文件,或改 yaml 升配重启 web | | `POST /v1/files/upload` 返 413 `已达磁盘配额上限` | per-user 5GB(yaml `quotas.disk_bytes_per_user`)。让用户在 dev SPA 右侧文件栏删旧产物 / 大文件,或改 yaml 升配重启 web |
@ -765,6 +791,7 @@ sudo xfs_quota -x -c "limit -p bhard=10g zcbot_<user_uuid>" /opt
- **工具**:`tools/{fs, shell, run_python, skill_tool}.py` - **工具**:`tools/{fs, shell, run_python, skill_tool}.py`
- **Web**:`web/{app.py, auth.py, broker.py, sinks.py}` + `web/static/dev.html`(dev SPA)+ `web/static/vendor/`(office 预览 jszip/docx-preview/xlsx) - **Web**:`web/{app.py, auth.py, broker.py, sinks.py}` + `web/static/dev.html`(dev SPA)+ `web/static/vendor/`(office 预览 jszip/docx-preview/xlsx)
- **配置**:`config/agent.yaml` + `config/models/*.yaml`(§3.2 Model Profile) - **配置**:`config/agent.yaml` + `config/models/*.yaml`(§3.2 Model Profile)
- **模型档位(per-account 模型访问)**:`config/agent.yaml` `model_tiers` 段定义「档位→可用模型 id 集合」;`users.plan` 存档位名,空/未知 → `default` 档,`role=admin` 全开。管理后台「各用户用量」表的「档位」下拉改 plan(`PATCH /v1/admin/users/{uid}/plan`);档位定义见 `GET /v1/admin/tiers`。改 `model_tiers` 后**重启 web** 生效;无需 migration(`plan` 列 0001 起就有)。模型 id:文本=`family.variant`,图/视频=variant key。行为:用户只看到本档模型;显式选档外模型 403;老 task 下次发消息若模型已不在档位内 → 自动落回 `deepseek_v4.flash`
- **Skill**:`skills/{coding,ppt,proposal}/SKILL.md`(渐进披露,§3.5) - **Skill**:`skills/{coding,ppt,proposal}/SKILL.md`(渐进披露,§3.5)
- **Workspace**(per-user 子树,user_id 来自 JWT `sub`): - **Workspace**(per-user 子树,user_id 来自 JWT `sub`):
- `workspace/users/<user_id>/.memory/{core.md, extended/}` — 跨 task 记忆,FS 永久,dotfile 隔离 - `workspace/users/<user_id>/.memory/{core.md, extended/}` — 跨 task 记忆,FS 永久,dotfile 隔离

View File

@ -1,7 +1,7 @@
# zcbot Skill 清单 # zcbot Skill 清单
服务对象:中国建筑材料科学研究总院 —— 无机非金属材料 R&D(水泥 / 混凝土 / 玻璃 / 陶瓷 / 耐火 / 新型建材) 服务对象:中国建筑材料科学研究总院 —— 无机非金属材料 R&D(水泥 / 混凝土 / 玻璃 / 陶瓷 / 耐火 / 新型建材)
最后更新:2026-06-18 最后更新:2026-07-02(ppt skill 加渲图验收闭环 + 导出验收硬门 + 几何质检)
Skill 总数:17 Skill 总数:17
zcbot 的"skill"是一份可加载的工作流脚本(`skills/<name>/SKILL.md` + 配套 templates / scripts / Python helper),模型在识别用户意图后挂载对应 skill,按其内置的阶段化流程产出可交付物。本文档面向**使用方 / 协作方**,按"做什么、什么时候用、什么时候别用、典型产物"组织。 zcbot 的"skill"是一份可加载的工作流脚本(`skills/<name>/SKILL.md` + 配套 templates / scripts / Python helper),模型在识别用户意图后挂载对应 skill,按其内置的阶段化流程产出可交付物。本文档面向**使用方 / 协作方**,按"做什么、什么时候用、什么时候别用、典型产物"组织。
@ -19,7 +19,7 @@ zcbot 的"skill"是一份可加载的工作流脚本(`skills/<name>/SKILL.md` +
| 科研写作 | [standard](#standard) | 起草标准:国标 / 行标 / 团标(含 T/CSTM)+ 编制说明 | | 科研写作 | [standard](#standard) | 起草标准:国标 / 行标 / 团标(含 T/CSTM)+ 编制说明 |
| 科研写作 | [patent](#patent) | 写发明专利技术交底书(供代理师转写) | | 科研写作 | [patent](#patent) | 写发明专利技术交底书(供代理师转写) |
| 科研写作 | [review](#review) | 审稿 / 润色 / 校对(中英文,长文档分段深审) | | 科研写作 | [review](#review) | 审稿 / 润色 / 校对(中英文,长文档分段深审) |
| 演示出图 | [ppt](#ppt) | 生成 PowerPoint 演示稿(商务红主题,大纲对齐后一脚本整建) | | 演示出图 | [ppt](#ppt) | 生成可编辑 PowerPoint(SVG-first:逐页手写 SVG → 转原生 DrawingML;19 种视觉风格 + 模板库) |
| 演示出图 | [plot_pub](#plot_pub) | 出版级 matplotlib 学术图(中文 + viridis + 矢量 + 投稿级复合图设计纪律) | | 演示出图 | [plot_pub](#plot_pub) | 出版级 matplotlib 学术图(中文 + viridis + 矢量 + 投稿级复合图设计纪律) |
| 文献检索 | [research](#research) | 查 paper_server(OpenAlex 元数据 + Sci-Hub 下载) | | 文献检索 | [research](#research) | 查 paper_server(OpenAlex 元数据 + Sci-Hub 下载) |
| 文献检索 | [documents](#documents) | 查内部 7 学科材料知识库(100W+ 论文,跨语言检索;host-side tool 持 key) | | 文献检索 | [documents](#documents) | 查内部 7 学科材料知识库(100W+ 论文,跨语言检索;host-side tool 持 key) |
@ -57,7 +57,7 @@ zcbot 的"skill"是一份可加载的工作流脚本(`skills/<name>/SKILL.md` +
- **引文三角核验**(`citation_verify.md`,移植 ARS 思路、后端换成自有 documents/research 库):存在性 → 三角印证 → 支撑度(抓原文比对 ≤25 词锚点,partial 就改论断迁就证据),编造引文零容忍 - **引文三角核验**(`citation_verify.md`,移植 ARS 思路、后端换成自有 documents/research 库):存在性 → 三角印证 → 支撑度(抓原文比对 ≤25 词锚点,partial 就改论断迁就证据),编造引文零容忍
- "先定图表再写正文"纪律(接 plot_pub 出 figure)+ 文献矩阵立证据底座 - "先定图表再写正文"纪律(接 plot_pub 出 figure)+ 文献矩阵立证据底座
- 写作顺序 Methods→Results→Intro→Discussion→Abstract→Title;关键章一段一卡 + 预告下一段 - 写作顺序 Methods→Results→Intro→Discussion→Abstract→Title;关键章一段一卡 + 预告下一段
- `quality_check.py`:结构 / 占位符 / 过度宣称 + **引文交叉核对**(orphan / uncited / 编号连续);`render_docx.py` 中英字体切换 + 图题自增;`word_count.py` 按类型 × 语言核篇幅 - `quality_check.py`:结构 / 占位符 / 过度宣称 + **引文交叉核对**(orphan / uncited / 编号连续);docx/pdf 调平台渲染层 `rendering/render.py --profile paper`(中英字体切换 + 图题自增);`word_count.py` 按类型 × 语言核篇幅
- 终审复用 review skill 的反谄媚审稿协议;可选出 cover letter / AI 声明 / CRediT - 终审复用 review skill 的反谄媚审稿协议;可选出 cover letter / AI 声明 / CRediT
**典型产物**:`<topic>.docx`(投稿稿)+ sections/ 分章草稿 + `lit_matrix.md`(文献矩阵)+ `CITATIONS.md`(引文核验台账)。 **典型产物**:`<topic>.docx`(投稿稿)+ sections/ 分章草稿 + `lit_matrix.md`(文献矩阵)+ `CITATIONS.md`(引文核验台账)。
@ -168,41 +168,31 @@ zcbot 的"skill"是一份可加载的工作流脚本(`skills/<name>/SKILL.md` +
## 演示出图 ## 演示出图
### ppt ### ppt
**生成 PowerPoint 演示文稿 (.pptx)。** **生成可编辑 PowerPoint 演示文稿 (.pptx)。SVG-first 路线。**
把材料(汇报草稿 / 项目方案 / 调研报告)变成可演示的 .pptx。流程:**先定调(8 项 + 逐页大纲)→ 一个脚本建整 deck → quality_check 验收**。方向在大纲阶段对齐,执行阶段一把出稿(不逐页来回)。视觉走**卡片式系统**(圆角卡片 + 柔和投影 + 渐变 + 从主色派生的明暗色阶),原生可编辑,告别扁平办公模板观感 把材料(汇报草稿 / 项目方案 / 调研报告)变成可演示、**可编辑**的 .pptx。流程:**素材摄取 → 八条对齐 + 逐页大纲(spec)→ [配图] → 逐页手写 SVG → SVG 质检 → 后处理 → 全量渲图验收 → 导出 PPTX**(导出边界硬门:每页都要渲图过目、标记 pass 且此后源未改动,否则拒绝产出 pptx)。核心是 AI 把每页当**矢量设计稿手写成 SVG**(设计自由度=浏览器级),再由纯 Python 转换器逐元素译成**原生 DrawingML**(形状/文本/渐变都能在 PowerPoint 里选中改)——告别 python-pptx 固定版式件的单调与 AI 味
**触发**: **触发**:
- ✅ 用户明确点名 PPT / 幻灯片 / 演示文稿 / .pptx / slide / deck - ✅ 用户明确点名 PPT / 幻灯片 / 演示文稿 / .pptx / slide / deck
- ⛔ 用户明确说"报告 / 文档 / 纪要"等纯文档产物 → 不走本 skill - ⛔ 用户明确说"报告 / 文档 / 纪要"等纯文档产物 → 不走本 skill
- ⚠️ 用户说"汇报 / 方案 / 材料"等产物形态不明 → **先反问** PPT 还是 Word/Markdown,确认后再 load - ⚠️ 用户说"汇报 / 方案 / 材料"等产物形态不明 → **先反问** PPT 还是 Word/Markdown,确认后再 load
**默认主题 —— 商务红**(硬约束): **默认主题 —— 自由设计**(content-driven):按内容+受众+选定 visual_style 派生配色版式,spec 阶段给 ≥3 套候选挑;商务红/品牌色作为候选之一,用户点名或素材有 brand guideline 才锁定。
- 主色 `#C00000` / 辅色 `#E15554` / 强调色 `#FFC107`
- ⛔ 不允许擅自换色,除非用户明确点其它配色或提供 brand guideline
**八条对齐**(spec 阶段定稿): **八条对齐**(spec 阶段定稿,ah):画布 / 页数 / 受众+核心信息+投递目的 / mode+visual_style / 配色 / 图标库 / 字体+字号 / 配图。确认后产出两份引擎契约:`design_spec.md`(人读叙事)+ `spec_lock.md`(机读执行锁,executor 每页重读、抗长 deck 漂移)。
| # | 项 | 默认值 |
|---|---|---|
| 1 | 画布 | 16:9 (13.33×7.5 in) |
| 2 | 页数 | 封面 + 5-8 页正文 + 尾页(Q&A) = 7-10 页 |
| 3 | 受众 | 看材料推断:领导汇报 / 同行评审 / 客户 pitch |
| 4 | 风格 | 现代简约(白底 + 细线 + 留白) |
| 5 | 配色 | 商务红 |
| 6 | 字体 | 微软雅黑 + Arial |
| 7 | 图标 | Iconify `tabler` 集(主色染色,本地缓存;概念页配图标底块) |
| 8 | 图表 / 配图 | 数据图 matplotlib / 少量数字上 KPI 卡;真实配图 opt-in 走 imagegen(每张 ¥0.22) |
**核心能力**: **核心能力**:
- **信息设计纪律(咨询级的真功)**:论断式标题(写结论不写主题)、Takeaway 结论框、数据语境化(数字带对比基准+趋势)、page_rhythm 节奏(anchor/dense/breathing,breathing 页强制打破卡片网格) - **SVG→原生 PPTX 转换器**:逐元素译 DrawingML(圆角矩形/渐变/阴影/箭头/裁切图都映射原生),非截图嵌图,完全可编辑;默认嵌演讲者备注 + Office 兼容兜底
- **组合版式件**(一函数一整块):`add_card_grid`(均衡网格)/ `add_timeline`(时间轴)/ `add_cycle`(闭环)/ `add_toc`(目录)/ `add_kpi`(数字卡带对比+升降)/ `add_takeaway` / `add_source` - **19 种视觉风格 + 5 种叙事骨架**:editorial / swiss-minimal / glassmorphism / dark-tech / data-journalism… × pyramid / narrative / instructional / showcase / briefing —— 去 AI 味的关键
- **质感工具箱**:`add_card`(圆角卡,投影克制——平铺卡默认平)/ `add_gradient_rect` / `add_icon_tile` / `add_pill` / 派生明暗色阶 + 语义色 `GOOD/BAD` - **模板库**:layouts(版式)/ decks(整套:中汽研/招商银行/重庆大学等)/ brands(品牌)/ charts(71 个图表信息图)/ icons(5 套共 1.1w+ 图标,finalize 自动内嵌)
- **混合背景** `render_bg.py`:无头 Chrome 渲杂志级背景图 + 其上原生可编辑文字(封面/章节) - **逐页节奏纪律**:论断式标题、page_rhythm(anchor/dense/breathing,breathing 页禁卡片墙)、内容→版式映射、图文版式 72 式
- **观感验收** `pptx_preview.py`:把 .pptx 渲成 PNG 肉眼验版面(quality_check 查结构,预览查好看) - **SVG 质检** `svg_quality_checker.py`:禁用特性 / viewBox / spec_lock 漂移 / 配色越界 / **几何检测**(文本·图标包围盒估算,拦大字压说明、图标压字、行溢出画布、文字骑卡片边缘)(error 必改,回写 SVG;**导出边界自动复跑同套硬错误,error 拒绝导出、无豁免参数**)
- 演讲者备注 `add_notes` + 业务图标双层兜底(Iconify → 本地缓存 → unicode) - **渲图验收闭环** `svg_preview.py` + `accept_pages.py`:无头 Chrome 全量渲 PNG 肉眼/vision 验版面,逐页标 pass/fail 落 `.build/acceptance.json`;**导出 gate 只认"渲过 + 看过标 pass + 渲后源未改(sha1)"**,跳验收/盲改混不过去;`update_spec.py` 一键改色/字体传播到所有 SVG
- `quality_check.py` 结构验收(越界 / 溢出 / 按列 bullet / 按色系三色制 / 重叠)+ markitdown 素材摄取 - AI 配图走 imagegen skill;markitdown 素材摄取
**典型产物**:`<task>.pptx` + `build_deck.py`(整 deck 构建脚本,改稿/修验收项都改它重跑)。 **典型产物**:`exports/<topic>_<ts>.pptx`(原生可编辑)+ `svg_output/*.svg`(逐页设计源,改稿对象)+ `design_spec.md`/`spec_lock.md`。
> 引擎/知识/模板移植自开源 **ppt-master**(github.com/hugohe3/ppt-master,MIT),适配 zcbot 的 task_dir / 聊天确认 / imagegen 工作流。
--- ---
@ -314,7 +304,7 @@ paper_server 是内部 Django 文献库:元数据来自 OpenAlex,PDF / XML 由 S
- **三路分工 + 去重**:research+documents 取文献(同 DOI 一条、documents 全文优先)、web 单列产业政策动向不混论文总结;中文方向→英文术语转译(SCM/LC3 等缩写展开) - **三路分工 + 去重**:research+documents 取文献(同 DOI 一条、documents 全文优先)、web 单列产业政策动向不混论文总结;中文方向→英文术语转译(SCM/LC3 等缩写展开)
- **每篇带摘要概述**:列表不只标题,每篇 24 句讲研究对象/方法/主要发现,基于 abstract 或全文、不夸张不评判 - **每篇带摘要概述**:列表不只标题,每篇 24 句讲研究对象/方法/主要发现,基于 abstract 或全文、不夸张不评判
- **引文核验**:存在性 / DOI 真伪(以库返回字段为准)/ 支撑度(摘要概述与原文一致,partial 改概述迁就证据),编造零容忍 - **引文核验**:存在性 / DOI 真伪(以库返回字段为准)/ 支撑度(摘要概述与原文一致,partial 改概述迁就证据),编造零容忍
- **自带 `render_docx.py`**:商务红主题 + 论文列表 `[n]` 作锚点、正文 `[n]`/`[Wn]` 引文上标回链 + DOI/URL 可点击超链接(条目内 DOI 子串也链)+ 化学式下标(CO₂/C₃S...,白名单不误伤 LC3/Ca2+);做 deck 转 ppt - **平台渲染层 `rendering/render.py --profile brief`**(docx/pdf):商务红主题 + 论文列表 `[n]` 作锚点、正文 `[n]`/`[Wn]` 引文上标回链 + DOI/URL 可点击超链接(条目内 DOI 子串也链)+ 化学式下标(CO₂/C₃S...,白名单不误伤 LC3/Ca2+);pdf 走沙盒 chromium;做 deck 转 ppt
**典型产物**:`<方向>-简报.md`(默认,含 `01_papers` 重要论文列表 + `02_summary` 内容总结)+ `evidence.md`(证据表);可选转 docx / deck。 **典型产物**:`<方向>-简报.md`(默认,含 `01_papers` 重要论文列表 + `02_summary` 内容总结)+ `evidence.md`(证据表);可选转 docx / deck。

View File

@ -0,0 +1 @@
THssshZfneJwIG5Y

View File

@ -2,6 +2,35 @@
default_model: deepseek_v4.flash default_model: deepseek_v4.flash
models_dir: config/models models_dir: config/models
# 模型档位(per-account 模型访问控制,见 core/model_access.py)。users.plan 存档位名;
# plan 为空 / 未知 → 落 `default` 档;role=admin 始终全开,不受此限制。
# 每档列出可用的模型 id:文本 = `family.variant`(config/models/);图/视频 = variant key
# (config/media/doubao.yaml)。成员含 `"*"` = 全开(含未来新增模型)。
# 三个 list 端点(/v1/models、/v1/image_models、/v1/video_models)按档过滤,用户只看到本档模型;
# 新建/切换/发媒体时再硬校验(老 task 续跑读 task.model_profile 不打断)。改后重启 web 生效。
model_tiers:
default: # 基线:所有未分配档位的用户(= 公测期默认可用)
- deepseek_v4.flash
- deepseek_v4.pro
- local.r1 # 内网模型(涉密任务)
- local.qwen3
- seedream_5 # 图(config/media/doubao.yaml image 段)
- seedance_2_fast # 视频
- seedance_2_pro
pro: # 基线 + 豆包 Seed 2.1 + GLM
- deepseek_v4.flash
- deepseek_v4.pro
- local.r1 # 内网模型(涉密任务)
- local.qwen3
- doubao.turbo
- doubao.pro
- doubao.evolving
- glm.pro
- glm.pro52
- seedream_5
- seedance_2_fast
- seedance_2_pro
skills_dir: skills skills_dir: skills
workspace_dir: workspace workspace_dir: workspace
system_prompt: prompts/system/general_v1.md system_prompt: prompts/system/general_v1.md

View File

@ -21,6 +21,11 @@ image:
endpoint: /images/generations endpoint: /images/generations
price_cny_per_image: 0.22 # 计费单位:成功输出张数;调价改这里 + 重启 price_cny_per_image: 0.22 # 计费单位:成功输出张数;调价改这里 + 重启
default_size: 2048x2048 # 原生最高 3072x3072;2K 兼顾质量/体积 default_size: 2048x2048 # 原生最高 3072x3072;2K 兼顾质量/体积
# 输出尺寸面积约束(ARK 硬门):面积 < min_pixels → 400 InvalidParameter。
# 模型自选 16:9 之类小尺寸(如 1920x1080=2.07M)会栽,故 tool 侧等比钳到合法区间:
# min = 1920² = 3,686,400(16:9 最小合规即 2560x1440);max = 3072² = 9,437,184。
min_pixels: 3686400
max_pixels: 9437184
default_watermark: false # 默认无水印(申报/PPT 场景反需求) default_watermark: false # 默认无水印(申报/PPT 场景反需求)
default_search: false # web search 额外加价 ~¥0.05/张;默认关 default_search: false # web search 额外加价 ~¥0.05/张;默认关
request_timeout_s: 60 # 出图慢于此判超时 request_timeout_s: 60 # 出图慢于此判超时
@ -40,7 +45,8 @@ vision:
price_cny_per_mtoken_output: 3.6 price_cny_per_mtoken_output: 3.6
price_cny_per_mtoken_cache_hit: 0.12 price_cny_per_mtoken_cache_hit: 0.12
max_image_mb: 10 # 单图上限(超出 tool 侧直接报错,不发请求) max_image_mb: 10 # 单图上限(超出 tool 侧直接报错,不发请求)
request_timeout_s: 60 # 读图慢于此判超时 request_timeout_s: 120 # 读图慢于此判超时(非流式,长 OCR 首字节可能逼近上限)
timeout_retries: 1 # 超时/网络抖动 tool 内透明重试次数(退避 2^n s);不含业务错误
video: video:
# fast 放第一个 → 默认 variant(成本敏感场景优先);开通了 Pro 的用户从顶栏下拉切。 # fast 放第一个 → 默认 variant(成本敏感场景优先);开通了 Pro 的用户从顶栏下拉切。

84
config/models/doubao.yaml Normal file
View File

@ -0,0 +1,84 @@
# 豆包 Seed 2.1 文本/Agent 模型档案(火山方舟 Ark)
# 走 Ark 的 OpenAI 兼容 /chat/completions:litellm 用 `openai/` 前缀 + api_base 覆盖,
# 与 config/models/local.yaml 同范式(避免 litellm volcengine provider 的版本/字段差异)。
# api_key 复用媒体侧的 ARK_API_KEY(同一火山账号),env 见 RUN.md。
#
# thinking_mode 暂设 false:Seed 2.1 是深度思考模型,但开关走 Ark body `thinking:{type:enabled}`,
# 与 OpenAI/DeepSeek 的 `reasoning_effort` 等级协议不同 —— 同 glm.yaml 的处理,要 core/llm.py
# 加 family 分支才能透传等级,留 TODO。设 false 只是不发 reasoning_effort 字段;模型默认仍会
# 深度思考并返回 reasoning_content,不影响调用。
# 单价见各 variant(元/百万 tokens,来源:火山方舟 2026-06 发布价)。
family: doubao
variants:
turbo:
display_name: 豆包 Seed 2.1 Turbo
model_id: openai/doubao-seed-2-1-turbo-260628
api_base: https://ark.cn-beijing.volces.com/api/v3
api_key_env: ARK_API_KEY
max_context: 262144 # 256K
reliable_context: 131072
max_output: 16384 # 模型上限 128K(含思考),这里保守取值,需要长输出可调高
parallel_tools: true # Ark 兼容 parallel_tool_calls,默认 true
tool_calling_quality: good
thinking_mode: false
reasoning_effort_levels: []
default_reasoning_effort: ""
code_quality: good
enable_run_python: true
max_iterations: 120 # backstop 兜底,非"轮"预算;真正的空转防护是 loop 的无进展熔断 + _RepeatGuard
optimal_temperature: 0.3
prompt_caching: false
extended_thinking: false
input_cny_per_mtoken: 3.0
output_cny_per_mtoken: 15.0
cache_hit_cny_per_mtoken: 0.6
pro:
display_name: 豆包 Seed 2.1 Pro
model_id: openai/doubao-seed-2-1-pro-260628
api_base: https://ark.cn-beijing.volces.com/api/v3
api_key_env: ARK_API_KEY
max_context: 262144 # 256K
reliable_context: 131072
max_output: 16384 # 模型上限 128K(含思考),这里保守取值,需要长输出可调高
parallel_tools: true
tool_calling_quality: excellent
thinking_mode: false
reasoning_effort_levels: []
default_reasoning_effort: ""
code_quality: excellent
enable_run_python: true
max_iterations: 150 # backstop 兜底,非"轮"预算;真正的空转防护是 loop 的无进展熔断 + _RepeatGuard
optimal_temperature: 0.3
prompt_caching: false
extended_thinking: false
input_cny_per_mtoken: 6.0
output_cny_per_mtoken: 30.0
cache_hit_cny_per_mtoken: 1.2
evolving:
# 自进化版:统一 model_id `doubao-seed-evolving`,每周至少迭代一次,始终指向最新版。
# 面向 Coding/Agent 持续优化,覆盖全场景(与 pro 旗舰、turbo 低成本并列)。
display_name: 豆包 Seed Evolving(自进化)
model_id: openai/doubao-seed-evolving
api_base: https://ark.cn-beijing.volces.com/api/v3
api_key_env: ARK_API_KEY
max_context: 262144 # 256K(随版本可能变,按 Seed 2.1 家族取值)
reliable_context: 131072
max_output: 16384
parallel_tools: true
tool_calling_quality: excellent
thinking_mode: false
reasoning_effort_levels: []
default_reasoning_effort: ""
code_quality: excellent
enable_run_python: true
max_iterations: 150 # backstop 兜底,非"轮"预算;真正的空转防护是 loop 的无进展熔断 + _RepeatGuard
optimal_temperature: 0.3
prompt_caching: false
extended_thinking: false
# evolving 官方未单独公布单价,暂按 pro 估值兜底(宁高勿低,不少记成本);公布后校正。
input_cny_per_mtoken: 6.0
output_cny_per_mtoken: 30.0
cache_hit_cny_per_mtoken: 1.2

View File

@ -25,3 +25,28 @@ variants:
optimal_temperature: 0.3 optimal_temperature: 0.3
prompt_caching: false prompt_caching: false
extended_thinking: false extended_thinking: false
# GLM 5.2:与 5.1 并存(新增 variant,不动 glm.pro,线上 task 仍引 5.1 不受影响)。
# 旗舰基座,真正可用的 1M 上下文,适合大仓库/长链路工程任务。thinking 同 pro 留 false(协议同 5.1)。
pro52:
display_name: GLM 5.2
model_id: zai/glm-5.2
api_base: https://open.bigmodel.cn/api/paas/v4
api_key_env: ZHIPUAI_API_KEY
max_context: 1000000 # 真 1M
reliable_context: 262144
max_output: 8192
parallel_tools: false
tool_calling_quality: good
thinking_mode: false
reasoning_effort_levels: []
default_reasoning_effort: ""
code_quality: excellent
enable_run_python: true
max_iterations: 50
optimal_temperature: 0.3
prompt_caching: false
extended_thinking: false
input_cny_per_mtoken: 8.0
output_cny_per_mtoken: 28.0
cache_hit_cny_per_mtoken: 2.0

View File

@ -1,3 +1,3 @@
# zcbot 版本号单一事实源:web/app.py 的 FastAPI version、/healthz 返回、前端展示都引这里。 # zcbot 版本号单一事实源:web/app.py 的 FastAPI version、/healthz 返回、前端展示都引这里。
# 改版本只动这一行。 # 改版本只动这一行。
__version__ = "0.20.1" __version__ = "0.38.8"

View File

@ -60,6 +60,7 @@ from tools.schedule import (
ScheduleCancelTool, ScheduleCreateTool, ScheduleListTool, ScheduleUpdateTool, ScheduleCancelTool, ScheduleCreateTool, ScheduleListTool, ScheduleUpdateTool,
) )
from tools.send_email import SendEmailTool, smtp_configured from tools.send_email import SendEmailTool, smtp_configured
from tools.wechat_bot import WechatPushTool, wechat_push_available
from core.ark_client import ArkConfig from core.ark_client import ArkConfig
from core.bocha_client import BochaConfig from core.bocha_client import BochaConfig
@ -564,10 +565,21 @@ def build_agent(
# 发邮件(§8.5 投递):仅当 SMTP_* env 齐了才挂(沿用"有 key 才注册",没配的 # 发邮件(§8.5 投递):仅当 SMTP_* env 齐了才挂(沿用"有 key 才注册",没配的
# 部署里 agent 看不到一个永远报错的工具)。定时与交互 run 都可用。 # 部署里 agent 看不到一个永远报错的工具)。定时与交互 run 都可用。
# base_dir 用 working_dir_path(该 task 的**宿主**工作目录绝对路径),不是 tool_base(cwd)。
# send_email 在宿主进程读附件文件,docker 下 agent 给的相对路径相对容器 workdir=task_dir,
# 翻回宿主即 working_dir_path;tool 内 _resolve_user_file 再处理 /workspace 容器绝对路径。
if smtp_configured(): if smtp_configured():
se = SendEmailTool(base_dir=tool_base, user_root=ur_path) se = SendEmailTool(base_dir=working_dir_path, user_root=ur_path)
tools[se.name] = se tools[se.name] = se
# 微信主动推送(§8.7 渠道抽象):仅当微信渠道开关在才挂(沿用"有开关才注册")。
# 交互与定时 run 都可用(定时简报可主动推回用户微信,24h 窗口内)。user_id ctor 注入。
# base_dir 同 send_email:用 working_dir_path(宿主 task 目录),wechat_push 在宿主进程
# 读待发文件,需把 agent 给的相对/容器路径翻回宿主(详 _resolve_user_file)。
if wechat_push_available():
wp = WechatPushTool(uid, base_dir=working_dir_path, user_root=ur_path, task_id=task_id)
tools[wp.name] = wp
if caps.enable_run_python: if caps.enable_run_python:
rp = RunPythonTool(base_dir=tool_base, user_root=ur_path) rp = RunPythonTool(base_dir=tool_base, user_root=ur_path)
tools[rp.name] = rp tools[rp.name] = rp

View File

@ -23,6 +23,14 @@ class ArkError(RuntimeError):
"""ark API 调用失败的统一异常。""" """ark API 调用失败的统一异常。"""
class ArkTimeoutError(ArkError):
"""可重试的瞬时失败:请求超时 / 网络抖动(非业务错误)。
HTTP 4xx/5xx 业务错误仍抛普通 ArkError(不该重试,重试也是同样的错)
caller 可单独 catch 本子类做退避重试;catch ArkError 仍能兜住(isinstance)
"""
@dataclass @dataclass
class ArkConfig: class ArkConfig:
api_key: str api_key: str
@ -73,18 +81,18 @@ class ArkClient:
try: try:
resp = self._client.post(path, json=body, timeout=timeout_s or self.timeout_s) resp = self._client.post(path, json=body, timeout=timeout_s or self.timeout_s)
except httpx.TimeoutException as e: except httpx.TimeoutException as e:
raise ArkError(f"timeout calling POST {path}: {e}") from e raise ArkTimeoutError(f"timeout calling POST {path}: {e}") from e
except httpx.HTTPError as e: except httpx.HTTPError as e:
raise ArkError(f"network error calling POST {path}: {e}") from e raise ArkTimeoutError(f"network error calling POST {path}: {e}") from e
return self._parse(resp, f"POST {path}") return self._parse(resp, f"POST {path}")
def get_json(self, path: str, *, timeout_s: Optional[float] = None) -> dict: def get_json(self, path: str, *, timeout_s: Optional[float] = None) -> dict:
try: try:
resp = self._client.get(path, timeout=timeout_s or self.timeout_s) resp = self._client.get(path, timeout=timeout_s or self.timeout_s)
except httpx.TimeoutException as e: except httpx.TimeoutException as e:
raise ArkError(f"timeout calling GET {path}: {e}") from e raise ArkTimeoutError(f"timeout calling GET {path}: {e}") from e
except httpx.HTTPError as e: except httpx.HTTPError as e:
raise ArkError(f"network error calling GET {path}: {e}") from e raise ArkTimeoutError(f"network error calling GET {path}: {e}") from e
return self._parse(resp, f"GET {path}") return self._parse(resp, f"GET {path}")
@staticmethod @staticmethod

View File

@ -49,6 +49,68 @@ def _message_chars(msg: dict[str, Any]) -> int:
return len(str(msg)) return len(str(msg))
_INTERRUPTED_TOOL_RESULT = (
"[interrupted: tool result missing — run was cut off "
"(disconnect/cancel) before this tool finished]"
)
def _repair_dangling_tool_calls(
messages: List[dict[str, Any]],
) -> tuple[List[dict[str, Any]], int]:
"""补齐被中断 run 留下的悬空 tool_calls,返回 (修复后的消息, 补的占位条数)。
run 在写入 `assistant.tool_calls` 之后tool 结果写入之前被中断(上游断连 /
用户取消 / 崩溃),会在历史里留下一条 `assistant.tool_calls` 后面没有对应 tool
结果的消息;用户随后继续发言,下一轮把历史原样发给 OpenAI/DeepSeek 就会被拒:
"An assistant message with 'tool_calls' must be followed by tool messages
responding to each 'tool_call_id'"(2026-06-18 DB 实测 task 5c5d6d25 命中)。
这里在发送前为每个**缺失** tool_call_id 紧跟其 assistant 消息补一条占位 tool
消息,满足协议且不丢上下文纯发送期处理,不改库 对所有中断路径和已存在的坏
数据都生效
"""
repaired: List[dict[str, Any]] = []
repaired_count = 0
n = len(messages)
i = 0
while i < n:
msg = messages[i]
repaired.append(msg)
tool_calls = msg.get("tool_calls") if isinstance(msg, dict) else None
if isinstance(msg, dict) and msg.get("role") == "assistant" and tool_calls:
id_to_name = {
tc.get("id"): (tc.get("function") or {}).get("name")
for tc in tool_calls
if isinstance(tc, dict) and tc.get("id")
}
# 收集紧随其后的连续 tool 消息已回应的 id(协议要求 tool 结果紧跟 assistant)。
answered: set[Any] = set()
j = i + 1
while j < n and isinstance(messages[j], dict) and messages[j].get("role") == "tool":
cid = messages[j].get("tool_call_id")
if cid:
answered.add(cid)
repaired.append(messages[j])
j += 1
# 为缺失的 id 补占位 tool 消息(保持在该 assistant 的 tool 结果块内)。
for cid, name in id_to_name.items():
if cid not in answered:
synthetic: dict[str, Any] = {
"role": "tool",
"tool_call_id": cid,
"content": _INTERRUPTED_TOOL_RESULT,
}
if name:
synthetic["name"] = name
repaired.append(synthetic)
repaired_count += 1
i = j
continue
i += 1
return repaired, repaired_count
def prepare_messages_for_llm( def prepare_messages_for_llm(
messages: List[dict[str, Any]], messages: List[dict[str, Any]],
*, *,
@ -87,6 +149,8 @@ def prepare_messages_with_stats(
""" """
if keep_recent < 0: if keep_recent < 0:
keep_recent = 0 keep_recent = 0
# 先补齐被中断 run 留下的悬空 tool_calls(否则原样发给模型会被拒,见函数注释)。
messages, repaired_tool_calls = _repair_dangling_tool_calls(messages)
original_chars = sum(_message_chars(m) for m in messages) original_chars = sum(_message_chars(m) for m in messages)
# 未到上下文压力门槛 → 原样发,零压缩(缓存全暖 + 不丢信息)。压缩是"放不下"才做的事。 # 未到上下文压力门槛 → 原样发,零压缩(缓存全暖 + 不丢信息)。压缩是"放不下"才做的事。
@ -99,6 +163,7 @@ def prepare_messages_with_stats(
"compacted_tool_messages": 0, "compacted_tool_messages": 0,
"compacted_skill_messages": 0, "compacted_skill_messages": 0,
"compaction_skipped": 1, "compaction_skipped": 1,
"repaired_tool_calls": repaired_tool_calls,
} }
return prepared, stats return prepared, stats
@ -136,5 +201,6 @@ def prepare_messages_with_stats(
"compacted_tool_messages": compacted_tool_messages, "compacted_tool_messages": compacted_tool_messages,
"compacted_skill_messages": compacted_skill_messages, "compacted_skill_messages": compacted_skill_messages,
"compaction_skipped": 0, "compaction_skipped": 0,
"repaired_tool_calls": repaired_tool_calls,
} }
return prepared, stats return prepared, stats

View File

@ -150,6 +150,15 @@ def memory_block(
f"\n\n**写到这里**:core → `{base}/core.md`;" f"\n\n**写到这里**:core → `{base}/core.md`;"
f"专题 → `{base}/extended/<slug>.md`\n" f"专题 → `{base}/extended/<slug>.md`\n"
) )
# 快捷指令(与记忆是两套机制):触发词 → 完整指令的映射,存 shortcuts.md。**内容不注上下文**
# (入口层查表展开,不靠你召回),这里只给"能维护 + 格式",让你在用户要建/改快捷词时会写。
parts.append(
f"\n**快捷指令**:用户说\"记个快捷词 X → Y\"/\"把快捷词 X 改成/删掉\"时,维护 "
f"`{base}/shortcuts.md`(先 `read` 再 `edit`)。格式是两列 markdown 表 "
f"`| 触发词 | 完整指令 |`(表头 + `|---|---|` 分隔行 + 每条一行;触发词别含 `|`)。"
f"之后用户在任意入口(网页/微信/企业微信)整条打这个触发词,系统自动展开成完整指令 —— "
f"你无需在对话里替他执行触发,只负责把这行写对。\n"
)
if core: if core:
parts.append("\n### Core (常驻 prompt)\n") parts.append("\n### Core (常驻 prompt)\n")
parts.append(core) parts.append(core)

58
core/model_access.py Normal file
View File

@ -0,0 +1,58 @@
"""Per-account 模型访问控制(档位制)。
`users.plan` 存档位名;档位 可用模型集合定义在 `config/agent.yaml` `model_tiers`
- plan 为空 / 未知档位 `default` (= 基线,所有未分配用户)
- `role == 'admin'` 始终全开,不受档位限制(管理员要能测所有模型)
- 某档成员里出现 `"*"` 该档全开(含未来新增模型),给内部档用
模型 id 约定( list 端点 / resolve 校验一致):
- 文本模型 = `family.variant`(config/models/<family>.yaml), `doubao.pro``glm.pro52`
- / 视频模型 = variant key(config/media/doubao.yaml), `seedream_5``seedance_2_fast`
两者命名不冲突(文本带点媒体 variant 不带点),同一档集合里混放即可
纯函数 + yaml 配置,不碰 DB / HTTP 调用方(web )负责取 user plan/role
并把"拒绝"翻译成 HTTP 403这样 core 不耦合 fastapi
"""
from __future__ import annotations
from typing import Optional
DEFAULT_TIER = "default"
WILDCARD = "*"
def _tiers() -> dict[str, list[str]]:
"""从 agent.yaml 读 model_tiers;缺失 → 空 dict(→ 所有人落 default,而 default 也空 → 全禁)。
开发期不缓存,每次现读(load_config 自身轻量); yaml 重启 web 生效
"""
from core.agent_builder import load_config
return load_config().get("model_tiers") or {}
def tier_name(plan: Optional[str], tiers: Optional[dict] = None) -> str:
"""plan → 实际生效的档位名;plan 为空 / 不在 tiers 里 → DEFAULT_TIER。"""
tiers = _tiers() if tiers is None else tiers
p = (plan or "").strip()
return p if p in tiers else DEFAULT_TIER
def allowed_set(plan: Optional[str], role: Optional[str]) -> Optional[set[str]]:
"""该用户可用模型 id 集合;返回 None = 全开(admin 或档位含 '*')。
None set 语义不同:None=不设限(放行一切), set=一个都不许
"""
if (role or "") == "admin":
return None
tiers = _tiers()
members = tiers.get(tier_name(plan, tiers)) or []
if WILDCARD in members:
return None
return set(members)
def is_allowed(model_id: str, plan: Optional[str], role: Optional[str]) -> bool:
"""该用户能否使用某模型 id(文本 profile 或媒体 variant)。"""
allowed = allowed_set(plan, role)
return allowed is None or model_id in allowed

View File

@ -236,6 +236,11 @@ class SandboxPool:
skills_path = (self.repo_root / "skills").resolve() skills_path = (self.repo_root / "skills").resolve()
if skills_path.is_dir(): if skills_path.is_dir():
cmd += ["-v", f"{skills_path}:/sandbox/skills:ro"] cmd += ["-v", f"{skills_path}:/sandbox/skills:ro"]
# 平台渲染层(rendering/)只读 mount ── 各 skill 出 docx/pdf 调
# `python /sandbox/rendering/render.py`,不再自带 render 脚本。与 skills 同款 ro。
rendering_path = (self.repo_root / "rendering").resolve()
if rendering_path.is_dir():
cmd += ["-v", f"{rendering_path}:/sandbox/rendering:ro"]
if self.runtime: if self.runtime:
cmd += ["--runtime", self.runtime] cmd += ["--runtime", self.runtime]
cmd.append(self.image) cmd.append(self.image)

View File

@ -32,6 +32,9 @@ except ImportError: # pragma: no cover (py<3.9 不支持,本项目 3.11+)
FAILURE_DISABLE_THRESHOLD = 5 FAILURE_DISABLE_THRESHOLD = 5
# 单次 tick 最多认领多少 job(防一批同点任务一次性涌入) # 单次 tick 最多认领多少 job(防一批同点任务一次性涌入)
CLAIM_LIMIT = 20 CLAIM_LIMIT = 20
# 新建 job 不指定时的默认单次超时(秒)。0=不限;给个有限默认防"跑到一半被
# 无限拖着 / 静默吞成 ok"。报告类重活(多刊检索+渲 docx)按经验 30min 够用。
DEFAULT_TIMEOUT_SECONDS = 1800
def validate_cron(expr: str) -> None: def validate_cron(expr: str) -> None:
@ -182,27 +185,8 @@ def _newest_artifact(working_dir: Path) -> Optional[Path]:
return best return best
def deliver_notify( def _notify_email(to, job_name: str, when: str, artifact: Optional[Path]) -> None:
notify: Optional[dict[str, Any]],
*,
job_name: str,
working_dir: Path,
tz: str,
) -> None:
"""job 配了 notify 就确定性补发(不靠 agent 记性)。目前仅 email 通道:
把工作目录最新产物当附件,套固定模板发无产物则发纯文本告知已执行
阻塞 IO(smtplib),由编排层放进 run_in_executor 失败抛异常,编排层吞掉记日志
"""
if not notify or notify.get("channel") != "email":
return
to = notify.get("to")
if not to:
return
from tools.send_email import send_email_smtp # 延迟导入,避免 core→tools 顶层环依赖 from tools.send_email import send_email_smtp # 延迟导入,避免 core→tools 顶层环依赖
when = datetime.now(_tzinfo(tz)).strftime("%Y-%m-%d %H:%M")
artifact = _newest_artifact(working_dir)
if artifact is not None: if artifact is not None:
subject = f"[定时任务] {job_name} · {when}" subject = f"[定时任务] {job_name} · {when}"
body = f"定时任务「{job_name}」已于 {when} 执行,产物见附件:{artifact.name}" body = f"定时任务「{job_name}」已于 {when} 执行,产物见附件:{artifact.name}"
@ -213,6 +197,51 @@ def deliver_notify(
send_email_smtp(to, subject, body) send_email_smtp(to, subject, body)
def deliver_notify(
notify: Optional[dict[str, Any]],
*,
job_name: str,
working_dir: Path,
tz: str,
user_id: Optional[Any] = None,
) -> None:
"""job 配了 notify 就确定性补发(不靠 agent 记性)。通道:
- `email`:把工作目录最新产物当附件发到 notify.to
- `wechat`:把最新产物 + 一句话主动推到该用户已绑微信(§8.7);未送达( 24h 窗口 /
未绑 / 未开口) notify 配了 `to`(邮箱)+ SMTP 退邮件兜底,否则抛错
阻塞 IO(smtplib / httpx),由编排层放进 run_in_executor 失败抛异常,编排层吞掉记日志
"""
if not notify:
return
channel = notify.get("channel")
when = datetime.now(_tzinfo(tz)).strftime("%Y-%m-%d %H:%M")
artifact = _newest_artifact(working_dir)
if channel == "email":
to = notify.get("to")
if to:
_notify_email(to, job_name, when, artifact)
return
if channel == "wechat":
if user_id is None:
return
from core.wechat.service import send_to_user # 延迟导入,避免顶层环依赖
from tools.send_email import smtp_configured
text = (f"定时任务「{job_name}」已于 {when} 执行"
+ (f",产物:{artifact.name}" if artifact else ",本次未产生文件产物。"))
report = send_to_user(user_id, text, str(artifact) if artifact else None)
if report.delivered:
return
fb = notify.get("to") # 可选 fallback 邮箱
if fb and smtp_configured():
_notify_email(fb, job_name, when, artifact)
return
raise RuntimeError("微信推送未送达: " + ", ".join(r.reason for r in report.results))
# ───────────── CRUD 服务层(对话工具 + REST 端点共用,DESIGN §8.5)───────────── # ───────────── CRUD 服务层(对话工具 + REST 端点共用,DESIGN §8.5)─────────────
# #
# tools/schedule.py(对话)与 web/app.py 的 /v1/schedules(前端只读+停用/删除)都调 # tools/schedule.py(对话)与 web/app.py 的 /v1/schedules(前端只读+停用/删除)都调
@ -314,7 +343,7 @@ def create_job(
skill: str = "", skill: str = "",
notify: Optional[dict[str, Any]] = None, notify: Optional[dict[str, Any]] = None,
model_profile: str = "", model_profile: str = "",
timeout_seconds: int = 0, timeout_seconds: int = DEFAULT_TIMEOUT_SECONDS,
) -> dict[str, Any]: ) -> dict[str, Any]:
name = (name or "").strip() name = (name or "").strip()
prompt = (prompt or "").strip() prompt = (prompt or "").strip()

View File

@ -15,7 +15,7 @@ from pathlib import Path
from typing import Any, Dict, List, Optional from typing import Any, Dict, List, Optional
from uuid import UUID from uuid import UUID
from sqlalchemy import delete, select from sqlalchemy import delete, func, select
from .storage import session_scope from .storage import session_scope
from .storage.models import Message, Task from .storage.models import Message, Task
@ -116,17 +116,30 @@ class Session:
task_id DB 不存在,返回空 Session(messages 只含 system,_db_idx=0); task_id DB 不存在,返回空 Session(messages 只含 system,_db_idx=0);
调用方判断该不该报错 调用方判断该不该报错
只把 idx >= tasks.context_base_idx 的消息装进 LLM 上下文(channel 长会话软重置,
0019)base 之前的历史仍全量留 messages (web `/messages` gate,照旧翻得到)
**关键**:`_db_idx` 必须取 DB 真实总条数(下一条 append idx),不能用 len(rows)
否则下次 append 会复用已存在的 idx, uq_messages_task_idx / 覆盖历史
""" """
sess = cls(task_id=task_id, system_prompt=system_prompt, meta=meta) sess = cls(task_id=task_id, system_prompt=system_prompt, meta=meta)
with session_scope() as s: with session_scope() as s:
base = s.execute(
select(Task.context_base_idx).where(Task.task_id == task_id)
).scalar_one_or_none() or 0
rows = s.execute( rows = s.execute(
select(Message) select(Message)
.where(Message.task_id == task_id) .where(Message.task_id == task_id, Message.idx >= base)
.order_by(Message.idx) .order_by(Message.idx)
).scalars().all() ).scalars().all()
for row in rows: for row in rows:
sess.messages.append(dict(row.payload)) sess.messages.append(dict(row.payload))
sess._db_idx = len(rows) # 真实总条数(含 base 之前的归档历史),保证 append 续号不撞 idx。
sess._db_idx = s.execute(
select(func.count())
.select_from(Message)
.where(Message.task_id == task_id)
).scalar_one()
return sess return sess
@classmethod @classmethod

103
core/shortcuts.py Normal file
View File

@ -0,0 +1,103 @@
"""用户快捷指令(触发词 → 完整指令)。渠道无关,入口层确定性展开。
存储:`workspace/users/<user_id>/.memory/shortcuts.md` memory per-user 存储壳
(同一 workspace 内按 user_id 隔离,agent 已有该目录写权限),** memory 是两种机制**:
- memory 是注进 system prompt给模型**参考**的软上下文(概率召回)
- 快捷指令**不进上下文**:展开发生在入口层模型跑之前 每条入站消息先经 `expand()`
查表,整条精确命中触发词就把文本替换成完整指令再跑 agent所以存再多条,平时上下文也是 0;
触发时进上下文的就是那条完整指令本身(= 用户本来要打的字),无额外 token
维护(agent 自管, memory):用户在对话里说"记个快捷词:X → Y",模型往 shortcuts.md 写一行
(memory 契约里加了一句告诉它格式);触发不靠模型,靠本模块解析,确定零歧义
格式(markdown 两列表,容错解析;表头/分隔行自动跳过):
| 触发词 | 指令 |
|---|---|
| 简报 | 给我输出一份昨日的 AI 新闻简报 |
匹配语义:整条消息 `strip()` + `casefold()` 后与某触发词**精确相等**才展开;
"帮我出个简报" 不命中(当普通消息走)新话题魔法命令同风格,零误伤
触发词含 `|` 会破坏表格解析 约定触发词不含竖线;指令正文含竖线也会被截断,同样避免
"""
from __future__ import annotations
import re
from pathlib import Path
from typing import Dict, Optional, Tuple
from uuid import UUID
# 表头行的触发词(解析时跳过,避免把表头当成一条快捷词)
_HEADER_TRIGGERS = {"触发词", "触发", "快捷词", "快捷指令", "命令", "trigger", "shortcut"}
# markdown 表格分隔行的单元格:`---` / `:--` / `:-:` 之类
_SEP_RE = re.compile(r"^:?-+:?$")
def _shortcuts_file(workspace_dir: Path, user_id: UUID) -> Path:
return workspace_dir / "users" / str(user_id) / ".memory" / "shortcuts.md"
def _normalize(s: str) -> str:
return s.strip().casefold()
def _is_separator(cell: str) -> bool:
return bool(_SEP_RE.match(cell.replace(" ", "")))
def parse_shortcuts(text: str) -> Dict[str, str]:
"""解析 shortcuts.md 文本 → {归一化触发词: 完整指令}。纯函数,可测。
容错:只认以 `|` 起头的表格行;跳过分隔行表头行空单元格行;
触发词重复时**先出现者赢**(首行优先,和人读顺序一致)
"""
mapping: Dict[str, str] = {}
for raw in text.splitlines():
line = raw.strip()
if not line.startswith("|"):
continue
cells = [c.strip() for c in line.strip("|").split("|")]
if len(cells) < 2:
continue
trigger, prompt = cells[0], cells[1]
if not trigger or not prompt:
continue
if _is_separator(trigger) and _is_separator(prompt):
continue # 分隔行 |---|---|
key = _normalize(trigger)
if not key or key in _HEADER_TRIGGERS:
continue # 空或表头
mapping.setdefault(key, prompt) # 首行优先
return mapping
def load_shortcuts(workspace_dir: Path, user_id: UUID) -> Dict[str, str]:
"""读该用户 shortcuts.md 并解析;文件不存在 / 读失败 → 空表(不抛,不挡入站)。"""
p = _shortcuts_file(workspace_dir, user_id)
if not p.is_file():
return {}
try:
return parse_shortcuts(p.read_text(encoding="utf-8"))
except (OSError, UnicodeDecodeError):
return {}
def expand(
workspace_dir: Path, user_id: UUID, text: str
) -> Tuple[str, Optional[str]]:
"""入口层展开:整条 `text` 精确命中某触发词 → 返回 (完整指令, 命中的触发词原文);
未命中 返回 (text 原样, None)空文本直接原样返回
调用点:渠道核心 `_run_channel_conversation` + 网页 `post_message`,共用此函数,
保证任何入口打同一个触发词行为一致
"""
if not text or not text.strip():
return text, None
mapping = load_shortcuts(workspace_dir, user_id)
if not mapping:
return text, None
prompt = mapping.get(_normalize(text))
if prompt is None:
return text, None
return prompt, text.strip()

View File

@ -46,6 +46,11 @@ class User(Base):
oidc_subject: Mapped[Optional[str]] = mapped_column(Text, nullable=True) oidc_subject: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
password_hash: Mapped[Optional[str]] = mapped_column(Text, nullable=True) password_hash: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
plan: Mapped[Optional[str]] = mapped_column(Text, nullable=True) plan: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
# 0016:平台登录注入的用户档案。name=显示名/姓名,user_name=平台账号名;均 nullable
# (platform_key 入口 ensure_user_row upsert 写;邮箱密码 / 历史行留空)。未来 OIDC
# 接管时由 ID token 的 name / preferred_username claim 注入,数据流不变。
name: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
user_name: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
# 0009:访问角色。'user'(默认)/ 'admin';仅 admin 可访问 /v1/admin/* 管理端点。 # 0009:访问角色。'user'(默认)/ 'admin';仅 admin 可访问 /v1/admin/* 管理端点。
# 提管理员:main.py user role --email X --role admin。 # 提管理员:main.py user role --email X --role admin。
role: Mapped[str] = mapped_column(Text, nullable=False, server_default="user") role: Mapped[str] = mapped_column(Text, nullable=False, server_default="user")
@ -65,6 +70,9 @@ class Task(Base):
working_dir: Mapped[str] = mapped_column(Text, nullable=False) working_dir: Mapped[str] = mapped_column(Text, nullable=False)
skill: Mapped[str] = mapped_column(Text, nullable=False, default="") skill: Mapped[str] = mapped_column(Text, nullable=False, default="")
description: Mapped[str] = mapped_column(Text, nullable=False, default="") description: Mapped[str] = mapped_column(Text, nullable=False, default="")
# 渠道来源(0011):web=网页端常规任务 / wechat=微信 ClawBot 常驻对话。
# 仅 INSERT 时由建 task 方写定,后续 upsert/save 不传 → 不覆盖。前端据此打徽章 + 置顶。
channel: Mapped[str] = mapped_column(Text, nullable=False, default="web", server_default="web")
status: Mapped[str] = mapped_column(Text, nullable=False, default="active") status: Mapped[str] = mapped_column(Text, nullable=False, default="active")
model: Mapped[str] = mapped_column(Text, nullable=False, default="") model: Mapped[str] = mapped_column(Text, nullable=False, default="")
model_profile: Mapped[str] = mapped_column(Text, nullable=False, default="") model_profile: Mapped[str] = mapped_column(Text, nullable=False, default="")
@ -77,6 +85,12 @@ class Task(Base):
# 只有 error 是持久终态(下次起新 run 时由 post_message 清掉) # 只有 error 是持久终态(下次起新 run 时由 post_message 清掉)
run_status: Mapped[str] = mapped_column(Text, nullable=False, default="idle") run_status: Mapped[str] = mapped_column(Text, nullable=False, default="idle")
run_error: Mapped[Optional[str]] = mapped_column(Text, nullable=True) run_error: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
# 喂给模型的上下文窗口起点(0019,channel 长会话软重置)。Session.load 只把 idx >=
# context_base_idx 的消息装进 LLM 上下文;之前的历史仍全量留 messages 表(web 翻得到)。
# web 普通任务恒 0 = 喂全量;channel 入站按 gap / 「新话题」推进。详 DESIGN §8.7。
context_base_idx: Mapped[int] = mapped_column(
Integer, nullable=False, default=0, server_default="0"
)
created_at: Mapped[datetime] = mapped_column( created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), nullable=False DateTime(timezone=True), server_default=func.now(), nullable=False
) )
@ -88,6 +102,14 @@ class Task(Base):
deleted_at: Mapped[Optional[datetime]] = mapped_column( deleted_at: Mapped[Optional[datetime]] = mapped_column(
DateTime(timezone=True), nullable=True DateTime(timezone=True), nullable=True
) )
# 定时任务执行归属(0017):非 NULL = 该 task 是某 scheduled_job 的一次执行(isolated
# 每次新建 / persistent 首次新建都填)。普通对话列表据此排除,不混进"用户项目"列表;
# crons 页可按 job 反查执行历史。job 走软删不硬删 → ondelete SET NULL 安全。
scheduled_job_id: Mapped[Optional[UUID]] = mapped_column(
PG_UUID(as_uuid=True),
ForeignKey("scheduled_jobs.job_id", ondelete="SET NULL"),
nullable=True,
)
class Message(Base): class Message(Base):
@ -107,6 +129,10 @@ class Message(Base):
# 0006:产生该 message 的模型(只在 assistant 行有值;user/tool/system 为 NULL)。 # 0006:产生该 message 的模型(只在 assistant 行有值;user/tool/system 为 NULL)。
# 跟 usage_events.model_profile 写入一致,JOIN-free 时按 message 直查也能拿到。 # 跟 usage_events.model_profile 写入一致,JOIN-free 时按 message 直查也能拿到。
model_profile: Mapped[Optional[str]] = mapped_column(Text, nullable=True) model_profile: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
# 消息来源(0018):NULL=agent run 产生;"push"=push 记录(_record_push_to_chat 写)。
# extract_last_assistant_text 据此跳过 push 记录,避免误取当入站回复。独立列不进 payload,
# 不影响 agent 上下文 / LLM API。
kind: Mapped[Optional[str]] = mapped_column(Text, nullable=True)
created_at: Mapped[datetime] = mapped_column( created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), nullable=False DateTime(timezone=True), server_default=func.now(), nullable=False
) )
@ -225,3 +251,36 @@ class ScheduledJob(Base):
) )
class ChannelBinding(Base):
"""微信渠道绑定(0015,DESIGN §8.7 渠道抽象)。
一行 = 一个用户在某渠道(`channel`)的一份绑定配置;PK=(user_id, channel) 1 用户每渠道 1
沿用本库判别列 + JSONB 多态范式( usage_events.kind+units / scheduled_jobs.notify):
各渠道配置字段不同,全装进 `config` JSONB,加渠道不动 schema不再各建一表
config 形态(敏感字段经 core/wechat/crypto.py 加密入 JSONB,绝不进沙箱/日志/API):
- channel='clawbot':{bot_token*, bot_im_id, user_im_id, base_url, latest_context_token*,
context_token_at(iso), chat_task_id(str)} *=密文;context_token 24h 窗口主动推靠它
- channel='wecom':{wecom_userid, chat_task_id(str)} wecom_userid 企业成员 id,
非密钥明文,无条件推 + 回调反查身份;chat_task_id 企业微信入站对话常驻 task
chat_task_id/FKper-字段 NOT NULL 退到应用层校验, usage_events JSONB 同向取舍
"""
__tablename__ = "channel_bindings"
user_id: Mapped[UUID] = mapped_column(
PG_UUID(as_uuid=True),
ForeignKey("users.user_id", ondelete="CASCADE"),
primary_key=True,
)
channel: Mapped[str] = mapped_column(Text, primary_key=True) # clawbot | wecom | ...
status: Mapped[str] = mapped_column(Text, nullable=False, server_default="active") # active|revoked
config: Mapped[dict[str, Any]] = mapped_column(JSONB, nullable=False, default=dict)
created_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), nullable=False
)
updated_at: Mapped[datetime] = mapped_column(
DateTime(timezone=True), server_default=func.now(), nullable=False
)

View File

@ -6,9 +6,10 @@ from uuid import UUID
from sqlalchemy import func, select, update from sqlalchemy import func, select, update
from sqlalchemy.dialects.postgresql import insert from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.exc import IntegrityError
from .engine import session_scope from .engine import session_scope
from .models import Task from .models import Message, Task
class NoSubtaskError(ValueError): class NoSubtaskError(ValueError):
@ -25,6 +26,8 @@ def ensure_local_task_row(
model: str = "", model: str = "",
model_profile: str = "", model_profile: str = "",
reasoning_effort: str = "", reasoning_effort: str = "",
channel: str = "web",
scheduled_job_id: Optional[UUID] = None,
) -> None: ) -> None:
"""占位 INSERT(ON CONFLICT DO NOTHING)—— 不覆盖已有字段。 """占位 INSERT(ON CONFLICT DO NOTHING)—— 不覆盖已有字段。
@ -45,6 +48,8 @@ def ensure_local_task_row(
model=model, model=model,
model_profile=model_profile, model_profile=model_profile,
reasoning_effort=reasoning_effort, reasoning_effort=reasoning_effort,
channel=channel,
scheduled_job_id=scheduled_job_id,
) )
.on_conflict_do_nothing(index_elements=["task_id"]) .on_conflict_do_nothing(index_elements=["task_id"])
) )
@ -52,6 +57,31 @@ def ensure_local_task_row(
s.execute(stmt) s.execute(stmt)
def append_channel_message(
task_id: UUID, content: str, *, role: str = "assistant", kind: Optional[str] = None
) -> None:
"""往 task 追加一条非 agent-run 产生的消息(push 出站记录等)。原子算 idx
(SELECT max(idx)+1)+INSERT; uq_messages_task_idx(与入站 agent run 并发
append) 重试payload 形态同 Session.append {role, content};不设
model_profile / tokens_*(非模型产出,usage 不计)kind messages.kind
(独立列,不进 payload):"push" 标记 push 记录,extract_last_assistant_text 据此跳过"""
payload = {"role": role, "content": content}
last_err: Optional[Exception] = None
for _ in range(3):
try:
with session_scope() as s:
max_idx = s.execute(
select(func.max(Message.idx)).where(Message.task_id == task_id)
).scalar()
next_idx = (max_idx if max_idx is not None else -1) + 1
s.add(Message(task_id=task_id, idx=next_idx, payload=payload, kind=kind))
return
except IntegrityError as e:
last_err = e
continue
raise RuntimeError(f"append_channel_message: idx 冲突重试耗尽: {last_err}")
def upsert_task( def upsert_task(
task_id: UUID, task_id: UUID,
*, *,

6
core/wechat/__init__.py Normal file
View File

@ -0,0 +1,6 @@
"""微信接入(DESIGN §8.7)。
渠道 A = ClawBot 个人微信 iLink Bot API(`ilink.py`,协议已真机实测,
`scripts/probe_clawbot*.py`);渠道 B = 企业微信自建应用(后续 `wecom.py`)
本包只放协议客户端等纯逻辑, DB / agent 编排解耦
"""

59
core/wechat/crypto.py Normal file
View File

@ -0,0 +1,59 @@
"""敏感凭据的列加密(DESIGN §8.7:bot_token / latest_context_token 加密入库)。
- env `ZCBOT_WECHAT_SECRET_KEY` 用其派生的 Fernet 密钥加密,密文带 `v1:` 前缀
- env 不在 退明文标记`plain:`(公测兜底,日志/沙箱/API 仍绝不带这两列;
正式部署应配 key)`enc()`/`dec()` 对两种前缀都可逆, key 不影响存量明文行
只在 host 进程(绑定服务 / 入站管理器 / push);绝不进沙箱 / run_python
"""
from __future__ import annotations
import base64
import hashlib
import os
from typing import Optional
from cryptography.fernet import Fernet, InvalidToken
_PREFIX_ENC = "v1:"
_PREFIX_PLAIN = "plain:"
def _fernet() -> Optional[Fernet]:
key = os.getenv("ZCBOT_WECHAT_SECRET_KEY", "").strip()
if not key:
return None
# 任意口令 → 32B → urlsafe-base64 Fernet 密钥(确定性,免单独管 Fernet key)
digest = hashlib.sha256(key.encode("utf-8")).digest()
return Fernet(base64.urlsafe_b64encode(digest))
def enc(plaintext: Optional[str]) -> Optional[str]:
"""明文 → 入库串。配了 key 走密文(v1:),否则明文标记(plain:)。None 透传。"""
if plaintext is None:
return None
f = _fernet()
if f is None:
return _PREFIX_PLAIN + plaintext
token = f.encrypt(plaintext.encode("utf-8")).decode("ascii")
return _PREFIX_ENC + token
def dec(stored: Optional[str]) -> Optional[str]:
"""入库串 → 明文。识别 v1:/plain: 前缀;v1: 需 key 且匹配。None 透传。"""
if stored is None:
return None
if stored.startswith(_PREFIX_PLAIN):
return stored[len(_PREFIX_PLAIN):]
if stored.startswith(_PREFIX_ENC):
f = _fernet()
if f is None:
raise RuntimeError(
"密文需要 ZCBOT_WECHAT_SECRET_KEY 才能解密,但 env 未配置"
)
try:
return f.decrypt(stored[len(_PREFIX_ENC):].encode("ascii")).decode("utf-8")
except InvalidToken as e:
raise RuntimeError("ZCBOT_WECHAT_SECRET_KEY 与密文不匹配(key 变了?)") from e
# 无前缀:历史/手填的裸明文,容错原样返回
return stored

411
core/wechat/ilink.py Normal file
View File

@ -0,0 +1,411 @@
"""ClawBot 个人微信 iLink Bot API 客户端(DESIGN §8.7 渠道 A)。
协议全部经真机实测(`scripts/probe_clawbot*.py`,2026-06-23):
- 绑定:`get_bot_qrcode`(无凭据,出深链 自渲二维码) 轮询 `get_qrcode_status`
(TTL ~1min,过期换码) `confirmed` `bot_token` + `baseurl`
- :`getupdates` 长轮询(hold 35s),消息带 `from_user_id` + `context_token`
- :`sendmessage`,**每条 `client_id` 必唯一**(漏则同 token 后续被丢);多条/长文
~1000 字分块,中间 `message_state=GENERATING(1)`末块 `FINISH(2)`,间隔 ~300ms
- `context_token` 有效期 ~24h可复用 主动推送靠它(用户须先开口拿到 token)
- 文件:`getuploadurl` AES-128-ECB(PKCS7)加密 POST 密文到 CDN `x-encrypted-param`
`sendmessage` `file_item`
纯协议客户端,不碰 DB / agent 编排阻塞 IO(httpx 同步),调用方放 to_thread / executor
"""
from __future__ import annotations
import base64
import hashlib
import os
import time
import uuid
from dataclasses import dataclass, field
from typing import Any, Optional
from urllib.parse import quote
import httpx
from cryptography.hazmat.primitives import padding
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
DEFAULT_BASE = "https://ilinkai.weixin.qq.com"
CDN_BASE = "https://novac2c.cdn.weixin.qq.com/c2c"
CHANNEL_VERSION = "1.0.2"
BOT_TYPE_PERSONAL = 3
# 协议枚举(源码 @tencent-weixin/openclaw-weixin src/api/types.ts,已实测)
MSG_TYPE_BOT = 2
STATE_GENERATING = 1
STATE_FINISH = 2
ITEM_TEXT = 1
ITEM_IMAGE = 2
ITEM_FILE = 4
UPLOAD_MEDIA_FILE = 3
UPLOAD_MEDIA_IMAGE = 1
# 分块:长文按 ~1000 字切,块间隔防丢
CHUNK_CHARS = 1000
CHUNK_DELAY_S = 0.3
MAX_FILE_BYTES = 20 * 1024 * 1024
def _uin_header() -> str:
"""X-WECHAT-UIN:base64(随机 uint32 的十进制字符串),反重放,每请求变。"""
n = int.from_bytes(os.urandom(4), "big")
return base64.b64encode(str(n).encode()).decode()
def _headers(bot_token: Optional[str] = None) -> dict[str, str]:
h = {
"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token",
"X-WECHAT-UIN": _uin_header(),
}
if bot_token:
h["Authorization"] = f"Bearer {bot_token}"
return h
def _base_info() -> dict[str, str]:
return {"channel_version": CHANNEL_VERSION}
def _new_client_id() -> str:
return f"openclaw-weixin-{uuid.uuid4().hex}"
def _aes_ecb_pkcs7(plaintext: bytes, key: bytes) -> bytes:
padder = padding.PKCS7(128).padder()
padded = padder.update(plaintext) + padder.finalize()
enc = Cipher(algorithms.AES(key), modes.ECB()).encryptor()
return enc.update(padded) + enc.finalize()
def _aes_ecb_unpkcs7(ciphertext: bytes, key: bytes) -> bytes:
"""收图/收文件的解密:AES-128-ECB 解 + 去 PKCS7(发送侧 `_aes_ecb_pkcs7` 的逆)。"""
dec = Cipher(algorithms.AES(key), modes.ECB()).decryptor()
padded = dec.update(ciphertext) + dec.finalize()
unpadder = padding.PKCS7(128).unpadder()
return unpadder.update(padded) + unpadder.finalize()
def _decode_media_aes_key(raw: str) -> bytes:
"""媒体 `media.aes_key` → 16 字节 AES key。两种实测编码兜住:
- `base64(raw 16 bytes)`(图片常见) 解码得 16 字节直用;
- `base64(hex 字符串)`(文件/语音/视频,发送侧 `_upload_file` 也用这种) 解码得
32 ASCII hex 字符, `fromhex` 16 字节
"""
dec = base64.b64decode(raw)
if len(dec) == 16:
return dec
if len(dec) == 32:
try:
return bytes.fromhex(dec.decode("ascii"))
except (ValueError, UnicodeDecodeError):
return dec[:16]
return dec[:16]
def _guess_image_ext(data: bytes) -> str:
"""按 magic bytes 猜图片扩展名(微信入站图片无原文件名)。认不出回退 .jpg。"""
if data[:3] == b"\xff\xd8\xff":
return ".jpg"
if data[:8] == b"\x89PNG\r\n\x1a\n":
return ".png"
if data[:6] in (b"GIF87a", b"GIF89a"):
return ".gif"
if data[:4] == b"RIFF" and data[8:12] == b"WEBP":
return ".webp"
if data[:2] == b"BM":
return ".bmp"
return ".jpg"
# ─────────────────────────── 绑定(无 token)───────────────────────────
@dataclass
class QrCode:
qrcode_id: str
deeplink: str # liteapp.weixin.qq.com/q/...,调用方自渲成二维码图片
def get_bot_qrcode(base_url: str = DEFAULT_BASE, *, timeout: float = 20.0) -> QrCode:
"""取一张绑定二维码。无需任何预置凭据。`deeplink` 需自渲成二维码让用户扫。"""
with httpx.Client(timeout=timeout) as c:
r = c.get(
f"{base_url}/ilink/bot/get_bot_qrcode",
params={"bot_type": BOT_TYPE_PERSONAL},
headers=_headers(),
)
r.raise_for_status()
d = r.json()
return QrCode(qrcode_id=d.get("qrcode", ""), deeplink=d.get("qrcode_img_content", ""))
@dataclass
class BindResult:
status: str # wait | confirmed | expired
bot_token: Optional[str] = None
base_url: Optional[str] = None
def poll_qrcode_status(
qrcode_id: str, base_url: str = DEFAULT_BASE, *, timeout: float = 40.0
) -> BindResult:
"""单次轮询扫码状态(服务端长轮询,hold 数十秒)。调用方循环调用,
`expired` 重新 `get_bot_qrcode` 换码`confirmed` 时返回 bot_token + base_url"""
with httpx.Client(timeout=timeout) as c:
r = c.get(
f"{base_url}/ilink/bot/get_qrcode_status",
params={"qrcode": qrcode_id},
headers=_headers(),
)
r.raise_for_status()
d = r.json()
return BindResult(
status=d.get("status", ""),
bot_token=d.get("bot_token"),
base_url=d.get("baseurl") or d.get("base_url"),
)
# ─────────────────────────── 收发(带 token)───────────────────────────
@dataclass
class InboundAttachment:
"""入站附件(图片 / 文件)的 CDN 引用 + 下载后填充的明文字节。
协议结构(getupdates 返回的 item_list ,实测 + 逆向 photon-hq/wechat-ilink-client):
- 图片 `image_item`(type=2):`media{encrypt_query_param, aes_key, encrypt_type}`,
另带优先 `aeskey`(32 hex);文件名缺失,下载后按 magic bytes 补扩展名
- 文件 `file_item`(type=4):`media{...}` + `file_name` + `len`(明文大小)
"""
kind: str # "image" | "file"
media: dict[str, Any] # {encrypt_query_param, aes_key, encrypt_type}
file_name: str = "" # 文件原名(图片无名,落盘时按 magic bytes 生成)
aeskey_hex: str = "" # 图片优先 key:image_item.aeskey(32 hex chars)
size: int = 0 # 明文大小(file_item.len / image mid_size),仅参考
data: Optional[bytes] = None # 下载 + 解密后的明文,由调用方(inbound)回填
@dataclass
class InboundMessage:
from_user_id: str # xxx@im.wechat
context_token: str # 回复 / 24h 内主动推须带回
text: str
raw: dict[str, Any]
attachments: list[InboundAttachment] = field(default_factory=list)
class ILinkClient:
"""绑定后按用户持有 `bot_token` + `base_url`,收发该用户消息。"""
def __init__(self, bot_token: str, base_url: str = DEFAULT_BASE) -> None:
self.bot_token = bot_token
self.base_url = base_url or DEFAULT_BASE
# —— 收 ——
def get_updates(
self, cursor: str = "", *, timeout: float = 45.0
) -> tuple[list[InboundMessage], str]:
"""长轮询拉新消息。返回 (消息列表, 新游标);游标传回下次调用。"""
with httpx.Client(timeout=timeout) as c:
r = c.post(
f"{self.base_url}/ilink/bot/getupdates",
json={"get_updates_buf": cursor, "base_info": _base_info()},
headers=_headers(self.bot_token),
)
r.raise_for_status()
d = r.json()
msgs: list[InboundMessage] = []
for m in d.get("msgs", []) or []:
text_parts: list[str] = []
attachments: list[InboundAttachment] = []
for it in m.get("item_list", []) or []:
if it.get("text_item"):
text_parts.append((it["text_item"] or {}).get("text", ""))
img = it.get("image_item")
if img:
attachments.append(InboundAttachment(
kind="image",
media=img.get("media") or {},
aeskey_hex=(img.get("aeskey") or ""),
size=int(img.get("mid_size") or 0),
))
fil = it.get("file_item")
if fil:
attachments.append(InboundAttachment(
kind="file",
media=fil.get("media") or {},
file_name=(fil.get("file_name") or "file"),
size=int(fil.get("len") or 0),
))
msgs.append(InboundMessage(
from_user_id=m.get("from_user_id", ""),
context_token=m.get("context_token", ""),
text="".join(text_parts),
raw=m,
attachments=attachments,
))
return msgs, d.get("get_updates_buf", cursor)
# —— 收附件(CDN 下载 → AES-128-ECB 解密 → 明文 bytes)——
def download_media(self, att: InboundAttachment, *, timeout: float = 60.0) -> bytes:
"""下载并解密一个入站附件,返回明文 bytes(发送侧上传链路的逆操作)。
URL:`{CDN_BASE}/download?encrypted_query_param=<media.encrypt_query_param>`
Key 优先级:图片 `image_item.aeskey`(32 hex)> `media.aes_key`(两种编码,
`_decode_media_aes_key`)
"""
media = att.media or {}
qp = media.get("encrypt_query_param") or media.get("encrypted_query_param") or ""
if not qp:
raise RuntimeError(f"附件无 encrypt_query_param: kind={att.kind} media={media}")
url = f"{CDN_BASE}/download?encrypted_query_param={quote(qp)}"
with httpx.Client(timeout=timeout) as c:
# 下载语义按逆向文档是 GET;CDN 若只认 POST 则回退一次(下载幂等,无副作用)
r = c.get(url)
if r.status_code == 405 or (400 <= r.status_code < 500 and not r.content):
r = c.post(url, content=b"")
r.raise_for_status()
ciphertext = r.content
if att.aeskey_hex and len(att.aeskey_hex) == 32:
key = bytes.fromhex(att.aeskey_hex)
else:
key = _decode_media_aes_key(media.get("aes_key") or "")
return _aes_ecb_unpkcs7(ciphertext, key)
# —— 发(底层单条)——
def _send(
self, to_user_id: str, context_token: str, item: dict, *, state: int
) -> None:
body = {
"msg": {
"from_user_id": "",
"to_user_id": to_user_id,
"client_id": _new_client_id(),
"message_type": MSG_TYPE_BOT,
"message_state": state,
"context_token": context_token,
"item_list": [item],
},
"base_info": _base_info(),
}
with httpx.Client(timeout=30.0) as c:
r = c.post(
f"{self.base_url}/ilink/bot/sendmessage",
json=body,
headers=_headers(self.bot_token),
)
# 成功为 HTTP 200 + 空 body {};非 200 抛错(空 body 不代表失败)
r.raise_for_status()
# —— 发文本(自动分块,长文不丢)——
def send_text(self, to_user_id: str, context_token: str, text: str) -> None:
text = text or ""
chunks = [text[i:i + CHUNK_CHARS] for i in range(0, len(text), CHUNK_CHARS)] or [""]
last = len(chunks) - 1
for i, chunk in enumerate(chunks):
self._send(
to_user_id, context_token,
{"type": ITEM_TEXT, "text_item": {"text": chunk}},
state=STATE_FINISH if i == last else STATE_GENERATING,
)
if i != last:
time.sleep(CHUNK_DELAY_S)
# —— 发文件(getuploadurl → AES-128-ECB → CDN → file_item)——
def _upload_file(self, to_user_id: str, data: bytes) -> dict[str, Any]:
rawsize = len(data)
rawmd5 = hashlib.md5(data).hexdigest()
aeskey = os.urandom(16)
filekey = os.urandom(16).hex()
ciphertext = _aes_ecb_pkcs7(data, aeskey)
filesize = len(ciphertext)
with httpx.Client(timeout=30.0) as c:
ru = c.post(
f"{self.base_url}/ilink/bot/getuploadurl",
json={
"filekey": filekey,
"media_type": UPLOAD_MEDIA_FILE,
"to_user_id": to_user_id,
"rawsize": rawsize,
"rawfilemd5": rawmd5,
"filesize": filesize,
"no_need_thumb": True,
"aeskey": aeskey.hex(),
"base_info": _base_info(),
},
headers=_headers(self.bot_token),
)
ru.raise_for_status()
uj = ru.json()
full = (uj.get("upload_full_url") or uj.get("uploadFullUrl")
or uj.get("full_url") or uj.get("url"))
param = (uj.get("upload_param") or uj.get("uploadParam") or uj.get("param"))
if full:
cdn_url = full
elif param:
cdn_url = (f"{CDN_BASE}/upload?encrypted_query_param={quote(param)}"
f"&filekey={quote(filekey)}")
else:
raise RuntimeError(f"getuploadurl 无 upload url/param: {uj}")
rc = c.post(cdn_url, content=ciphertext,
headers={"Content-Type": "application/octet-stream"})
download_param = rc.headers.get("x-encrypted-param")
if rc.status_code != 200 or not download_param:
raise RuntimeError(
f"CDN 上传失败 http={rc.status_code} "
f"err={rc.headers.get('x-error-message')}"
)
return {
"encrypt_query_param": download_param,
"aes_key": base64.b64encode(aeskey.hex().encode()).decode(),
"rawsize": rawsize,
}
def send_file(
self,
to_user_id: str,
context_token: str,
file_path: str | os.PathLike,
*,
file_name: Optional[str] = None,
) -> None:
data = _read_file_capped(file_path)
name = file_name or os.path.basename(str(file_path))
up = self._upload_file(to_user_id, data)
item = {
"type": ITEM_FILE,
"file_item": {
"media": {
"encrypt_query_param": up["encrypt_query_param"],
"aes_key": up["aes_key"],
"encrypt_type": 1,
},
"file_name": name,
"len": str(up["rawsize"]),
},
}
self._send(to_user_id, context_token, item, state=STATE_FINISH)
def attachment_basename(att: InboundAttachment) -> str:
"""入站附件的安全落盘文件名(不含目录):剥掉路径分隔防穿越;图片按 magic bytes 补扩展名。
返回的是 basename,调用方负责加前缀(时间戳 / 随机)防重名并拼到 inbound 目录下
"""
if att.kind == "image":
ext = _guess_image_ext(att.data or b"")
return f"image{ext}"
name = os.path.basename((att.file_name or "file").replace("\\", "/")).strip()
return name or "file"
def _read_file_capped(file_path: str | os.PathLike) -> bytes:
size = os.path.getsize(file_path)
if size > MAX_FILE_BYTES:
raise ValueError(f"文件超过 {MAX_FILE_BYTES // (1024*1024)}MB 上限")
with open(file_path, "rb") as f:
return f.read()

155
core/wechat/inbound.py Normal file
View File

@ -0,0 +1,155 @@
"""入站长轮询管理器(DESIGN §8.7):收用户消息 → 跑 agent → 回复发回。
- 每个 active 绑定一条 `getupdates` 长轮询(ilink 同步, to_thread);收到消息:
`service.refresh_context_token` 刷新 24h 推送窗口; 调注入的 `handle_message`
(app.py 提供:解析/建该用户常驻微信task run `_run_agent_bg` 取回复);
用本轮新鲜 `context_token` 分块发回
- 每绑定 loop **串行**处理(再收):天然避免同用户并发 run 锁冲突;不同用户并发
- 管理器周期性对账 active 绑定:新增起 loop撤销/revoke loop
`handle_message` 注入解耦 app.py 内部(broker / run / _run_agent_bg);本模块只管协议循环
与回复提取(`extract_last_assistant_text` 纯函数可测)
"""
from __future__ import annotations
import asyncio
from typing import Any, Awaitable, Callable, Optional
from uuid import UUID
from sqlalchemy import select
from core.storage import session_scope
from core.storage.models import Message
from core.wechat import service
from core.wechat.ilink import ILinkClient, InboundAttachment
from core.wechat.service import BindingSnapshot
# app.py 注入:跑该用户的微信对话 task,返回 assistant 回复文本(可空)。
# 第三参 attachments:已下载解密(att.data 已回填)的入站附件,app.py 负责落盘 + 拼提示行。
HandleMessage = Callable[[UUID, str, list[InboundAttachment]], Awaitable[str]]
def _content_to_text(content: Any) -> str:
"""OpenAI 风格 content → 纯文本(str 直返;content blocks 拼 text 段)。"""
if isinstance(content, str):
return content
if isinstance(content, list):
parts = []
for b in content:
if isinstance(b, dict) and b.get("type") in (None, "text"):
parts.append(b.get("text", ""))
return "".join(parts)
return ""
def extract_last_assistant_text(task_id: UUID, *, scan: int = 20) -> str:
"""取该 task 最后一条**有正文**的 assistant 消息文本(跳过纯 tool_calls 行)。"""
with session_scope() as s:
rows = s.execute(
select(Message.payload)
.where(Message.task_id == task_id, Message.kind.is_(None))
.order_by(Message.idx.desc())
.limit(scan)
).all()
for (payload,) in rows:
if not isinstance(payload, dict) or payload.get("role") != "assistant":
continue
text = _content_to_text(payload.get("content"))
if text.strip():
return text
return ""
async def _poll_binding(
snap: BindingSnapshot, handle_message: HandleMessage, stop: asyncio.Event
) -> None:
"""单个绑定的长轮询循环。异常退避重试,直到 stop。"""
client = ILinkClient(snap.bot_token, snap.base_url)
cursor = ""
backoff = 2
while not stop.is_set():
try:
msgs, cursor = await asyncio.to_thread(client.get_updates, cursor)
backoff = 2
except Exception as e: # noqa: BLE001
print(f"[wechat-inbound] {str(snap.user_id)[:8]} getupdates err: "
f"{type(e).__name__}: {e}; retry in {backoff}s")
await asyncio.sleep(backoff)
backoff = min(backoff * 2, 60)
continue
for m in msgs:
if stop.is_set():
break
# 下载入站附件(图片/文件):CDN 取密文 → AES 解密 → 回填 att.data
atts: list[InboundAttachment] = []
for att in m.attachments:
try:
att.data = await asyncio.to_thread(client.download_media, att)
atts.append(att)
except Exception as e: # noqa: BLE001
print(f"[wechat-inbound] {str(snap.user_id)[:8]} download "
f"{att.kind} err: {type(e).__name__}: {e}")
# 文本和附件都没有(纯文本为空 / 附件全下载失败)→ 跳过整条
if not m.text.strip() and not atts:
continue
# ① 刷新该用户推送窗口(主动推靠它续命)
await asyncio.to_thread(
service.refresh_context_token, snap.user_id, m.from_user_id, m.context_token
)
# ② 跑 agent 取回复(附件由 handle_message 落盘 + 拼 [用户上传的...] 行)
try:
reply = await handle_message(snap.user_id, m.text, atts)
except Exception as e: # noqa: BLE001
reply = f"[出错] {type(e).__name__}: {e}"
# ③ 用本轮新鲜 token 分块回
if reply and reply.strip():
try:
await asyncio.to_thread(
client.send_text, m.from_user_id, m.context_token, reply
)
except Exception as e: # noqa: BLE001
print(f"[wechat-inbound] {str(snap.user_id)[:8]} reply send err: "
f"{type(e).__name__}: {e}")
async def run_inbound_manager(
handle_message: HandleMessage,
stop: asyncio.Event,
*,
reconcile_seconds: int = 60,
) -> None:
"""常驻管理器:周期对账 active 绑定,起/停 per-binding 长轮询循环。"""
loops: dict[UUID, asyncio.Task] = {}
try:
while not stop.is_set():
try:
active = await asyncio.to_thread(service.list_active_bindings)
except Exception as e: # noqa: BLE001
print(f"[wechat-inbound] list bindings err: {type(e).__name__}: {e}")
active = []
active_ids = {s.user_id for s in active}
# 起新增
for snap in active:
t = loops.get(snap.user_id)
if t is None or t.done():
loops[snap.user_id] = asyncio.create_task(
_poll_binding(snap, handle_message, stop),
name=f"wechat-poll-{str(snap.user_id)[:8]}",
)
# 清撤销 / 已结束
for uid in list(loops):
if uid not in active_ids:
loops.pop(uid).cancel()
elif loops[uid].done():
loops.pop(uid)
await _wait_stop(stop, reconcile_seconds) # 等 stop 或到下次对账
finally:
for t in loops.values():
t.cancel()
async def _wait_stop(stop: asyncio.Event, timeout: float) -> None:
try:
await asyncio.wait_for(stop.wait(), timeout=timeout)
except asyncio.TimeoutError:
pass

498
core/wechat/service.py Normal file
View File

@ -0,0 +1,498 @@
"""微信渠道服务层(DESIGN §8.7):绑定 CRUD + 主动推送 + `send_to_user` 渠道抽象。
- 绑定行的 `bot_token` / `latest_context_token` `crypto` 加解密;快照(BindingSnapshot)
脱离 session含明文 token,** host 进程内用,绝不外泄/进沙箱**
- 主动推送 24h 窗口:`context_token` 仅在末次入站 ~24h 内可用;超期/未开口 推不出,
返回 reason 给调用方退邮件兜底(§8.5)
- `send_to_user` 是渠道抽象:scheduler / WechatPushTool 调它,不感知 ClawBot/企业微信;
企业微信(渠道 B)后续在此追加一路
阻塞 IO(DB + httpx),调用方放 to_thread / executor
"""
from __future__ import annotations
import os
from dataclasses import dataclass, field
from datetime import datetime, timedelta, timezone
from typing import Optional
from uuid import UUID
from sqlalchemy import func, select, update
from core.storage import session_scope
from core.storage.models import ChannelBinding, Message, Task
from core.wechat import crypto
from core.wechat.ilink import DEFAULT_BASE, ILinkClient
CONTEXT_TOKEN_TTL = timedelta(hours=24)
_CLAWBOT = "clawbot"
_WECOM = "wecom"
def _get_or_new(s, user_id: UUID, channel: str) -> ChannelBinding:
row = s.get(ChannelBinding, (user_id, channel))
if row is None:
row = ChannelBinding(user_id=user_id, channel=channel, config={})
s.add(row)
return row
def clawbot_enabled() -> bool:
"""ClawBot 渠道总开关(沿用「有开关才挂」范式,§3.4)。"""
return os.getenv("ZCBOT_WECHAT_BOT_ENABLED", "").strip().lower() in (
"1", "true", "yes", "on",
)
# ─────────────────────────── 绑定快照 / CRUD ───────────────────────────
@dataclass
class BindingSnapshot:
user_id: UUID
bot_token: str # 明文(已解密)
base_url: str
user_im_id: Optional[str]
context_token: Optional[str] # 明文(已解密)
context_token_at: Optional[datetime]
chat_task_id: Optional[UUID]
status: str
def _snap(row: ChannelBinding) -> BindingSnapshot:
"""channel='clawbot' 行 → 快照(解密 token,反序列化 config)。"""
cfg = row.config or {}
cta = cfg.get("context_token_at")
cti = cfg.get("chat_task_id")
return BindingSnapshot(
user_id=row.user_id,
bot_token=crypto.dec(cfg.get("bot_token")) or "",
base_url=cfg.get("base_url") or DEFAULT_BASE,
user_im_id=cfg.get("user_im_id"),
context_token=crypto.dec(cfg.get("latest_context_token")),
context_token_at=datetime.fromisoformat(cta) if cta else None,
chat_task_id=UUID(cti) if cti else None,
status=row.status,
)
def get_binding(user_id: UUID) -> Optional[BindingSnapshot]:
with session_scope() as s:
row = s.get(ChannelBinding, (user_id, _CLAWBOT))
return _snap(row) if row else None
def list_active_bindings() -> list[BindingSnapshot]:
"""入站长轮询管理器用:所有 active 的 ClawBot 绑定(含明文 bot_token)。"""
with session_scope() as s:
rows = (
s.execute(
select(ChannelBinding).where(
ChannelBinding.channel == _CLAWBOT,
ChannelBinding.status == "active",
)
)
.scalars()
.all()
)
return [_snap(r) for r in rows]
def upsert_clawbot_binding(
user_id: UUID, bot_token: str, base_url: str, *, bot_im_id: Optional[str] = None
) -> None:
"""扫码 confirmed 后写/更新绑定。bot_token 加密存进 config(保留已有 user_im_id 等)。"""
now = datetime.now(timezone.utc)
with session_scope() as s:
row = _get_or_new(s, user_id, _CLAWBOT)
cfg = dict(row.config or {})
cfg["bot_token"] = crypto.enc(bot_token)
cfg["base_url"] = base_url or DEFAULT_BASE
if bot_im_id:
cfg["bot_im_id"] = bot_im_id
row.config = cfg # 重新赋值 → ORM 追踪 JSONB 变更
row.status = "active"
row.updated_at = now
def refresh_context_token(user_id: UUID, user_im_id: str, context_token: str) -> None:
"""每条入站消息刷新该用户的 context_token(+时间戳)——主动推送窗口靠它续命。"""
now = datetime.now(timezone.utc)
with session_scope() as s:
row = s.get(ChannelBinding, (user_id, _CLAWBOT))
if row is None:
return
cfg = dict(row.config or {})
if user_im_id:
cfg["user_im_id"] = user_im_id
cfg["latest_context_token"] = crypto.enc(context_token)
cfg["context_token_at"] = now.isoformat()
row.config = cfg
row.updated_at = now
def set_chat_task(user_id: UUID, task_id: UUID) -> None:
now = datetime.now(timezone.utc)
with session_scope() as s:
row = s.get(ChannelBinding, (user_id, _CLAWBOT))
if row is not None:
cfg = dict(row.config or {})
cfg["chat_task_id"] = str(task_id)
row.config = cfg
row.updated_at = now
def unbind(user_id: UUID) -> bool:
"""解绑 ClawBot(标 revoked,不物理删 → 保留轨迹)。返回是否有绑定被改。"""
now = datetime.now(timezone.utc)
with session_scope() as s:
row = s.get(ChannelBinding, (user_id, _CLAWBOT))
if row is None:
return False
row.status = "revoked"
row.updated_at = now
return True
# ─────────────────────────── 推送 ───────────────────────────
@dataclass
class PushResult:
ok: bool
channel: str = "clawbot"
# sent | no_binding | never_opened | token_stale | error:<...>
reason: str = ""
def _token_fresh(snap: BindingSnapshot) -> bool:
if not snap.context_token or snap.context_token_at is None:
return False
at = snap.context_token_at
if at.tzinfo is None:
at = at.replace(tzinfo=timezone.utc)
return (datetime.now(timezone.utc) - at) < CONTEXT_TOKEN_TTL
def push_clawbot(
user_id: UUID, text: str = "", file_path: Optional[str] = None
) -> PushResult:
"""主动推一条到用户个人微信。仅在 24h 窗口内可用,否则返回 reason 供兜底。"""
snap = get_binding(user_id)
if snap is None or snap.status != "active":
return PushResult(False, reason="no_binding")
if not snap.user_im_id or not snap.context_token:
return PushResult(False, reason="never_opened") # 冷启动:用户从未开口
if not _token_fresh(snap):
return PushResult(False, reason="token_stale") # 超 24h 未互动
client = ILinkClient(snap.bot_token, snap.base_url)
try:
if text:
client.send_text(snap.user_im_id, snap.context_token, text)
if file_path:
client.send_file(snap.user_im_id, snap.context_token, file_path)
except Exception as e: # noqa: BLE001 —— 调用方据 reason 决定兜底
return PushResult(False, reason=f"error: {str(e)[:200]}")
return PushResult(True, reason="sent")
# ─────────────── 企业微信(渠道 B,纯推送;无 24h 窗口约束)───────────────
def get_wecom_userid(user_id: UUID) -> Optional[str]:
with session_scope() as s:
row = s.get(ChannelBinding, (user_id, _WECOM))
if row is None or row.status != "active":
return None
return (row.config or {}).get("wecom_userid")
def get_user_by_wecom_userid(wecom_userid: str) -> Optional[UUID]:
"""企业微信回调只带 wecom_userid → 反查内部 user_id(仅 active 绑定)。入站对话用。"""
if not wecom_userid:
return None
with session_scope() as s:
row = s.execute(
select(ChannelBinding.user_id).where(
ChannelBinding.channel == _WECOM,
ChannelBinding.status == "active",
ChannelBinding.config["wecom_userid"].astext == wecom_userid,
)
).first()
return row[0] if row else None
def upsert_wecom_binding(user_id: UUID, wecom_userid: str) -> None:
"""OAuth 拿到 userid 后写/更新绑定。合并进 config(保留 chat_task_id 等已有字段)。"""
now = datetime.now(timezone.utc)
with session_scope() as s:
row = _get_or_new(s, user_id, _WECOM)
cfg = dict(row.config or {})
cfg["wecom_userid"] = wecom_userid
row.config = cfg
row.status = "active"
row.updated_at = now
def get_wecom_chat_task(user_id: UUID) -> Optional[UUID]:
"""企业微信入站对话常驻 task id(无 → None)。"""
with session_scope() as s:
row = s.get(ChannelBinding, (user_id, _WECOM))
if row is None:
return None
cti = (row.config or {}).get("chat_task_id")
return UUID(cti) if cti else None
def set_wecom_chat_task(user_id: UUID, task_id: UUID) -> None:
now = datetime.now(timezone.utc)
with session_scope() as s:
row = s.get(ChannelBinding, (user_id, _WECOM))
if row is not None:
cfg = dict(row.config or {})
cfg["chat_task_id"] = str(task_id)
row.config = cfg
row.updated_at = now
def unbind_wecom(user_id: UUID) -> bool:
now = datetime.now(timezone.utc)
with session_scope() as s:
row = s.get(ChannelBinding, (user_id, _WECOM))
if row is None:
return False
row.status = "revoked"
row.updated_at = now
return True
def push_wecom(user_id: UUID, text: str = "", file_path: Optional[str] = None) -> PushResult:
"""企业微信主动推一条(无条件,不挑活跃度)。"""
from core.wechat import wecom
wuid = get_wecom_userid(user_id)
if not wuid:
return PushResult(False, channel="wecom", reason="no_binding")
try:
if text:
wecom.send_text(wuid, text)
if file_path:
wecom.send_file(wuid, file_path)
except Exception as e: # noqa: BLE001 —— 透出 errcode/errmsg 便于排错
return PushResult(False, channel="wecom", reason=f"error: {str(e)[:200]}")
return PushResult(True, channel="wecom", reason="sent")
@dataclass
class DeliveryReport:
results: list[PushResult] = field(default_factory=list)
@property
def delivered(self) -> bool:
return any(r.ok for r in self.results)
def active_channels() -> list[str]:
"""部署级「哪些渠道开了」的**唯一真相源**:门槛判断(`wechat_push_available`)
与投递(`send_to_user`)都引它,避免两处各列各的(曾漏判企业微信致工具不挂)
加渠道只改这一处,门槛与投递自动一致顺序即投递优先序"""
from core.wechat.wecom import wecom_configured
chans: list[str] = []
if clawbot_enabled():
chans.append(_CLAWBOT)
if wecom_configured():
chans.append(_WECOM)
return chans
_DISPATCH = {_CLAWBOT: push_clawbot, _WECOM: push_wecom}
def ensure_channel_chat_task(uid: UUID, channel: str) -> Optional[UUID]:
"""确保 uid 的 channel 常驻 chat task 存在(未软删),返回 task_id;不存在则新建并回填绑定。
channel {'wechat','wecom'}wechat binding 返回 None(没法建/)
入站对话(`_run_channel_conversation`) push 记录(`send_to_user`)共用此入口,
避免两条"解析/建 chat task"路径逻辑漂移 task 逻辑搬自原 _run_channel_conversation
"""
from uuid import uuid4
from core.agent_builder import ( # 延迟 import:service 被 tools.wechat_bot 引用,
load_config, resolve_workspace, working_dir_from_name, # agent_builder 又 import tools.wechat_bot
) # → 顶层 import 循环;函数内 import 打破(同 scheduler.py:227 范式)
from core.capabilities import ModelCapabilities
from core.paths import ROOT, to_db_path
from core.storage.models import Task
from core.storage.utils import ensure_local_task_row
if channel == "wecom":
existing_tid = get_wecom_chat_task(uid)
task_name, slug, desc = "企业微信对话", f"wecom-{str(uid)[:8]}", "(企业微信对话)"
set_task = set_wecom_chat_task
else: # wechat
snap = get_binding(uid)
if snap is None:
return None
existing_tid = snap.chat_task_id
task_name, slug, desc = "微信对话", f"wechat-{str(uid)[:8]}", "(微信 ClawBot 对话)"
set_task = set_chat_task
tid = existing_tid
need_create = tid is None
if not need_create:
with session_scope() as s:
exists = s.execute(
select(Task.task_id).where(Task.task_id == tid, Task.deleted_at.is_(None))
).first()
if exists is None:
need_create = True
if need_create:
cfg = load_config()
profile = cfg["default_model"]
caps = ModelCapabilities.load(profile, ROOT / cfg["models_dir"])
ws = resolve_workspace(None, cfg)
tid = uuid4()
fs_dir = working_dir_from_name(ws, uid, slug)
fs_dir.mkdir(parents=True, exist_ok=True)
ensure_local_task_row(
task_id=tid, name=task_name, working_dir=to_db_path(fs_dir),
skill="", user_id=uid, model=caps.model_id, model_profile=profile,
description=desc, channel=channel,
)
set_task(uid, tid)
return tid
# ─────────────────────── channel 长会话上下文软重置(0019) ───────────────────────
# gap 默认值:超过它未说话 → 入站时软重置(保留上一轮原文做续聊锚点)。可被
# config.json 的 channel.session_gap_hours 覆盖(见 reload 入口)。
SESSION_GAP_HOURS_DEFAULT = 6.0
# 用户在 channel 里发这些词 → 手动「新话题」硬重置(base 推到总数,彻底从零)。
NEW_TOPIC_COMMANDS = frozenset({"新话题", "新会话", "/new", "清空上下文"})
def reset_channel_context(task_id: UUID, *, hard: bool) -> int:
"""推进 task 的 context_base_idx(软重置),返回新 base。不删任何消息。
hard=True(手动新话题):base = 总消息数 下一条入站起彻底新会话
hard=False(自动 gap):base = 最后一条 user 消息 idx 新窗口仍带上上一轮原文,
续聊接得上; user 消息(理论上不会)退化为总数
"""
with session_scope() as s:
total = s.execute(
select(func.count()).select_from(Message).where(Message.task_id == task_id)
).scalar_one()
if hard:
new_base = int(total)
else:
last_user_idx = s.execute(
select(func.max(Message.idx)).where(
Message.task_id == task_id,
Message.payload["role"].astext == "user",
)
).scalar_one_or_none()
new_base = int(last_user_idx) if last_user_idx is not None else int(total)
s.execute(
update(Task).where(Task.task_id == task_id).values(context_base_idx=new_base)
)
return new_base
def maybe_gap_reset(task_id: UUID, gap_hours: float = SESSION_GAP_HOURS_DEFAULT) -> bool:
"""入站时检测:距上次消息超过 gap_hours → 软重置(保留上一轮)。返回是否重置。
仅入站对话调用(push 记录不触发)gap_hours <= 0 视为关闭自动分段
"""
if gap_hours <= 0:
return False
with session_scope() as s:
last_at = s.execute(
select(func.max(Message.created_at)).where(Message.task_id == task_id)
).scalar_one_or_none()
if last_at is None:
return False # 空 task,首条入站,无需重置
if (datetime.now(timezone.utc) - last_at) <= timedelta(hours=gap_hours):
return False
reset_channel_context(task_id, hard=False)
return True
def _file_rel_to_user_root(user_id: UUID, file_path: str) -> Optional[str]:
"""宿主绝对路径 → user_root 相对 POSIX(如 scheduled-<jobid>/x.md)。
文件不在 user_root (外部 --working-dir) None"""
from pathlib import Path
from core.agent_builder import load_config, resolve_workspace, user_root
try:
ws = resolve_workspace(None, load_config())
root = user_root(ws, user_id)
return Path(file_path).resolve().relative_to(root.resolve()).as_posix()
except Exception:
return None
def _build_push_message(text: str, rel: Optional[str]) -> str:
"""构造写进 chat task 的 assistant 消息:推送摘要 + 可点文件链接 + agent read 路径。"""
lines: list[str] = []
if text and text.strip():
lines.append(text.strip())
if rel:
fname = rel.rsplit("/", 1)[-1]
lines.append(f"产物文件:[{fname}](/v1/files/download?path={rel})")
lines.append(f"(如需基于此文件提问,可读取 ../{rel})")
return "\n\n".join(lines)
def _record_push_to_chat(
report: DeliveryReport, user_id: UUID, text: str,
file_path: Optional[str], source_task_id: Optional[UUID],
) -> None:
"""把投递成功的推送记为对应渠道 chat task 的 assistant 消息(web 端可见 +
agent 可基于追问)Unified 模式: agent 上下文(推送是 bot 发给用户的话,
记得自己发过什么 = 连贯,非污染)记录失败不影响投递(吞掉打日志)"""
if not report.delivered:
return
from core.storage.utils import append_channel_message
rel = _file_rel_to_user_root(user_id, file_path) if file_path else None
for r in report.results:
if not r.ok:
continue
ch = "wechat" if r.channel == _CLAWBOT else r.channel # clawbot→wechat(建 task channel)
try:
tid = ensure_channel_chat_task(user_id, ch)
if tid is None:
continue
if source_task_id is not None and tid == source_task_id:
continue # 调用方即该 chat task 自己的 run,tool 记录已在,不重复插摘要
append_channel_message(tid, _build_push_message(text, rel), kind="push")
except Exception as e: # noqa: BLE001 —— 记录失败不放大,投递已成功
print(f"[push] record to {ch} chat task failed: {type(e).__name__}: {e}")
def send_to_user(
user_id: UUID,
text: str = "",
file_path: Optional[str] = None,
channel: Optional[str] = None,
*,
source_task_id: Optional[UUID] = None,
) -> DeliveryReport:
"""渠道抽象:按 `active_channels()` 列出的已开渠道投递 + 把推送记进渠道 chat task。
- `channel=None`(默认):广播到所有已开渠道(定时任务/不点名推送沿用此口径)
- `channel="wecom"|"clawbot"`:用户点名某个微信时只投这一条;若该渠道未开/无效,
返回单条 `no_binding` 结果(不静默回退到别的渠道,避免又推到没点名的渠道)
- 投递成功后,对每个成功渠道把推送(摘要 + 文件链接 + read 路径)作为 assistant
消息写进该渠道 chat task(不存在自动建)`source_task_id` = 调用方所在 task:
若恰为目标 chat task 自己(如用户在微信里让 agent ),tool 记录已在,跳过去重
"""
report = DeliveryReport()
if channel is not None:
if channel in active_channels():
report.results.append(_DISPATCH[channel](user_id, text, file_path))
else:
report.results.append(PushResult(False, channel=channel, reason="no_binding"))
else:
for ch in active_channels():
report.results.append(_DISPATCH[ch](user_id, text, file_path))
_record_push_to_chat(report, user_id, text, file_path, source_task_id)
return report

252
core/wechat/wecom.py Normal file
View File

@ -0,0 +1,252 @@
"""企业微信自建应用客户端(DESIGN §8.7 渠道 B,出站推送 + 入站对话)。
本模块只管**出站**(access_token / OAuth 绑定 / 发送);**入站对话**走回调:加解密在
`wecom_crypto.py`(WXBizMsgCrypt 等价),回调端点 + 反查身份在 web/app.py `/v1/wecom/callback`,
对话核心复用 `_run_channel_conversation`(与个人微信同核心,各一张会话 task)
出站能力:
- `access_token`:`gettoken(corpid,secret)`,进程内缓存 ~2h线程安全errcode 失效即重取
- OAuth 扫码登录:`oauth_authorize_url()` 造扫码授权登录链接(桌面浏览器出二维码);
`get_user_id(code)` 拿成员 userid(绑定用,一次性)需管理员在应用配企业微信授权登录可信域名
- 发送:`send_text / send_markdown / send_file`(file `media/upload` media_id,20MB)
- `state` HMAC 签名( user_id + TTL, CSRF):回调无 JWT,用户身份从 state
凭据(secret)只在 host 进程读,绝不进沙箱 / run_python( ClawBot / send_email,§3.4)
阻塞 IO(httpx 同步),调用方放 to_thread / executor
"""
from __future__ import annotations
import base64
import hashlib
import hmac
import os
import threading
import time
from pathlib import Path
from typing import Optional
import httpx
QYAPI = "https://qyapi.weixin.qq.com/cgi-bin"
# 扫码授权登录(桌面浏览器渲染二维码,用企业微信 App 扫码)。
# 不能用 open.weixin.qq.com/connect/oauth2/authorize —— 那条是「网页授权」,只能在
# 企业微信客户端内打开,桌面浏览器会报「请在企业微信客户端打开链接」。
WWLOGIN_SSO = "https://login.work.weixin.qq.com/wwlogin/sso/login"
MAX_FILE_BYTES = 20 * 1024 * 1024
# access_token 进程内缓存
_tok_lock = threading.Lock()
_tok_val: Optional[str] = None
_tok_exp: float = 0.0
def wecom_configured() -> bool:
"""三件套齐才算配好(沿用「有 key 才挂」§3.4)。"""
return bool(
os.getenv("WECOM_CORPID", "").strip()
and os.getenv("WECOM_AGENTID", "").strip()
and os.getenv("WECOM_SECRET", "").strip()
)
def _corpid() -> str:
return os.getenv("WECOM_CORPID", "").strip()
def _agentid() -> str:
return os.getenv("WECOM_AGENTID", "").strip()
def _secret() -> str:
return os.getenv("WECOM_SECRET", "").strip()
def _state_secret() -> bytes:
# OAuth state 签名密钥:复用凭据加密 key,退 JWT_SECRET
key = (os.getenv("ZCBOT_WECHAT_SECRET_KEY", "").strip()
or os.getenv("JWT_SECRET", "").strip() or "zcbot-wecom")
return key.encode("utf-8")
# ─────────────────────────── access_token ───────────────────────────
def get_access_token(*, force: bool = False) -> str:
"""缓存的 app access_token;过期/force 时重取。线程安全。"""
global _tok_val, _tok_exp
with _tok_lock:
if not force and _tok_val and time.time() < _tok_exp:
return _tok_val
with httpx.Client(timeout=15) as c:
r = c.get(f"{QYAPI}/gettoken",
params={"corpid": _corpid(), "corpsecret": _secret()})
r.raise_for_status()
d = r.json()
if d.get("errcode", 0) != 0 or not d.get("access_token"):
raise RuntimeError(f"gettoken 失败: {d.get('errcode')} {d.get('errmsg')}")
_tok_val = d["access_token"]
_tok_exp = time.time() + int(d.get("expires_in", 7200)) - 300 # 提前 5min 续
return _tok_val
def _api_get(path: str, params: dict) -> dict:
"""带 access_token 的 GET;40014/42001(token 失效)自动重取一次。"""
for attempt in (1, 2):
tok = get_access_token(force=(attempt == 2))
with httpx.Client(timeout=15) as c:
r = c.get(f"{QYAPI}/{path}", params={"access_token": tok, **params})
r.raise_for_status()
d = r.json()
if d.get("errcode") in (40014, 42001) and attempt == 1:
continue
return d
return d
def _api_post(path: str, json_body: dict) -> dict:
for attempt in (1, 2):
tok = get_access_token(force=(attempt == 2))
with httpx.Client(timeout=20) as c:
r = c.post(f"{QYAPI}/{path}", params={"access_token": tok}, json=json_body)
r.raise_for_status()
d = r.json()
if d.get("errcode") in (40014, 42001) and attempt == 1:
continue
return d
return d
# ─────────────────────────── OAuth 绑定 ───────────────────────────
def sign_state(user_id: str, *, ttl: int = 600) -> str:
"""state = base64(user_id.exp).hmac —— 绑 user_id + 短 TTL,防 CSRF。"""
exp = int(time.time()) + ttl
payload = f"{user_id}.{exp}"
sig = hmac.new(_state_secret(), payload.encode(), hashlib.sha256).hexdigest()[:32]
raw = f"{payload}.{sig}"
return base64.urlsafe_b64encode(raw.encode()).decode().rstrip("=")
def verify_state(state: str) -> Optional[str]:
"""校验 state,返回 user_id;失败/过期返回 None。"""
try:
pad = "=" * (-len(state) % 4)
raw = base64.urlsafe_b64decode(state + pad).decode()
user_id, exp_s, sig = raw.rsplit(".", 2)
payload = f"{user_id}.{exp_s}"
good = hmac.new(_state_secret(), payload.encode(), hashlib.sha256).hexdigest()[:32]
if not hmac.compare_digest(sig, good):
return None
if int(exp_s) < int(time.time()):
return None
return user_id
except Exception:
return None
def oauth_authorize_url(redirect_uri: str, state: str) -> str:
"""造**扫码授权登录**链接:桌面浏览器打开会渲染二维码,用户用企业微信 App 扫码确认后
回跳到 redirect_uri code(后续 auth/getuserinfo userid 不变)
注意:redirect_uri 域名须在企业微信后台应用 企业微信授权登录 可信域名里登记,
网页授权可信域名是两项不同设置"""
from urllib.parse import quote
return (
f"{WWLOGIN_SSO}?login_type=CorpApp&appid={_corpid()}"
f"&agentid={_agentid()}"
f"&redirect_uri={quote(redirect_uri, safe='')}"
f"&state={quote(state, safe='')}"
)
def get_user_id(code: str) -> Optional[str]:
"""OAuth 回调用 code 换企业成员 userid(非成员返回 None)。"""
d = _api_get("auth/getuserinfo", {"code": code})
if d.get("errcode", 0) != 0:
raise RuntimeError(f"getuserinfo 失败: {d.get('errcode')} {d.get('errmsg')}")
return d.get("userid") # 外部联系人/非成员只有 openid → None
# ─────────────────────────── 发送 ───────────────────────────
def _send(touser: str, msgtype: str, body_field: dict) -> None:
payload = {"touser": touser, "msgtype": msgtype, "agentid": _agentid(), **body_field}
d = _api_post("message/send", payload)
if d.get("errcode", 0) != 0:
raise RuntimeError(f"message/send 失败: {d.get('errcode')} {d.get('errmsg')}")
def send_text(touser: str, content: str) -> None:
_send(touser, "text", {"text": {"content": content or ""}})
def send_markdown(touser: str, content: str) -> None:
_send(touser, "markdown", {"markdown": {"content": content or ""}})
def upload_media(file_path: str | os.PathLike, *, media_type: str = "file") -> str:
"""上传临时素材(3 天有效)→ media_id。"""
p = Path(file_path)
if p.stat().st_size > MAX_FILE_BYTES:
raise ValueError(f"文件超过 {MAX_FILE_BYTES // (1024*1024)}MB 上限")
for attempt in (1, 2):
tok = get_access_token(force=(attempt == 2))
with httpx.Client(timeout=30) as c, open(p, "rb") as f:
r = c.post(f"{QYAPI}/media/upload",
params={"access_token": tok, "type": media_type},
files={"media": (p.name, f)})
r.raise_for_status()
d = r.json()
if d.get("errcode") in (40014, 42001) and attempt == 1:
continue
break
if d.get("errcode", 0) != 0 or not d.get("media_id"):
raise RuntimeError(f"media/upload 失败: {d.get('errcode')} {d.get('errmsg')}")
return d["media_id"]
def send_file(touser: str, file_path: str | os.PathLike) -> None:
media_id = upload_media(file_path, media_type="file")
_send(touser, "file", {"file": {"media_id": media_id}})
# ─────────────────────────── 入站素材下载 ───────────────────────────
def _filename_from_disposition(disposition: str) -> str:
"""从 Content-Disposition 取文件名(filename="..." / filename*=UTF-8''...);取不到回空。"""
if not disposition:
return ""
import re
from urllib.parse import unquote
m = re.search(r"filename\*=(?:UTF-8'')?([^;]+)", disposition, re.IGNORECASE)
if m:
return unquote(m.group(1).strip().strip('"'))
m = re.search(r'filename="?([^";]+)"?', disposition, re.IGNORECASE)
return m.group(1).strip() if m else ""
def download_media(media_id: str) -> tuple[bytes, str]:
"""下载临时素材(`media/get`)→ (明文字节, 文件名)。入站图片/文件消息用。
成功回二进制流(文件名在 Content-Disposition);出错回 JSON(errcode/errmsg)
40014/42001(token 失效)自动重取一次供回调线程 to_thread
"""
last = None
for attempt in (1, 2):
tok = get_access_token(force=(attempt == 2))
with httpx.Client(timeout=60) as c:
r = c.get(f"{QYAPI}/media/get",
params={"access_token": tok, "media_id": media_id})
r.raise_for_status()
ctype = r.headers.get("content-type", "").lower()
if "application/json" in ctype or "text/plain" in ctype:
try:
d = r.json()
except Exception: # noqa: BLE001 —— 非 JSON 当二进制处理
d = None
if d is not None:
if d.get("errcode") in (40014, 42001) and attempt == 1:
continue
raise RuntimeError(f"media/get 失败: {d.get('errcode')} {d.get('errmsg')}")
fname = _filename_from_disposition(r.headers.get("content-disposition", ""))
return r.content, fname
raise RuntimeError(f"media/get 失败: token 重取后仍未拿到素材 {last}")

View File

@ -0,0 +1,93 @@
"""企业微信「接收消息」回调加解密(WXBizMsgCrypt 等价实现,DESIGN §8.7 渠道 B 入站)。
企业微信自建应用配接收消息回调后,服务器**主动 POST 加密 XML** 到回调 URL,
URL 时还会先 GET 一次 echostr 验有效性这套加密** wecom.py access_token /
出站 API 无关,也与 crypto.py Fernet 列加密无关** 是企业微信专用方案:
- key = base64decode(EncodingAESKey + "="),32B;IV = key[:16](AES-256-CBC)
- 明文密文体 = random(16) || msg_len(4B 大端) || msg || receiveid(自建应用为 corpid)
- 签名 = sha1(sorted([Token, timestamp, nonce, encrypt]) 拼接) hexdigest
只做**解密 + 验签**(入站);回复走 wecom.send_text 主动推(agent >5s 无法被动同步回),
故不实现加密凭据 Token / EncodingAESKey secret 只在 host 进程读,绝不进沙箱
"""
from __future__ import annotations
import base64
import hashlib
import os
import struct
import xml.etree.ElementTree as ET
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
def callback_token() -> str:
return os.getenv("WECOM_CALLBACK_TOKEN", "").strip()
def callback_aeskey() -> str:
return os.getenv("WECOM_CALLBACK_AESKEY", "").strip()
def callback_configured() -> bool:
"""Token + EncodingAESKey 都在才算配好回调(沿用「有 key 才挂」§3.4)。"""
return bool(callback_token() and callback_aeskey())
def _aes_key() -> bytes:
"""EncodingAESKey(43 字符)→ +'=' → base64 解码 → 32B AES 密钥。"""
return base64.b64decode(callback_aeskey() + "=")
def _signature(timestamp: str, nonce: str, encrypt: str) -> str:
arr = sorted([callback_token(), timestamp, nonce, encrypt])
return hashlib.sha1("".join(arr).encode("utf-8")).hexdigest()
def _aes_decrypt(encrypt_b64: str) -> bytes:
key = _aes_key()
cipher = Cipher(algorithms.AES(key), modes.CBC(key[:16]))
dec = cipher.decryptor()
raw = dec.update(base64.b64decode(encrypt_b64)) + dec.finalize()
pad = raw[-1] # PKCS7(企业微信 block=32,按末字节剥即可)
if not 1 <= pad <= 32:
raise ValueError("PKCS7 padding 非法")
return raw[:-pad]
def _extract_plain(encrypt_b64: str, *, expect_receiveid: str = "") -> str:
"""解密 → 剥 16B 随机前缀 + 4B 长度,取 msg;尾部 receiveid 校验 corpid。"""
raw = _aes_decrypt(encrypt_b64)
body = raw[16:]
msg_len = struct.unpack(">I", body[:4])[0]
msg = body[4:4 + msg_len]
receiveid = body[4 + msg_len:].decode("utf-8", "ignore")
if expect_receiveid and receiveid != expect_receiveid:
raise ValueError("receiveid 不匹配(corpid 校验失败)")
return msg.decode("utf-8")
def verify_url(
msg_signature: str, timestamp: str, nonce: str, echostr: str, *, corpid: str = ""
) -> str:
"""配回调 URL 时企业微信 GET 验有效性:验签 + 解密 echostr,原样回明文。"""
if _signature(timestamp, nonce, echostr) != msg_signature:
raise ValueError("签名校验失败")
return _extract_plain(echostr, expect_receiveid=corpid)
def parse_message(plain_xml: str) -> dict:
"""解密后的明文 XML → dict(FromUserName / MsgType / Content / MsgId / ...)。"""
root = ET.fromstring(plain_xml)
return {child.tag: (child.text or "") for child in root}
def decrypt_message(
msg_signature: str, timestamp: str, nonce: str, body: str, *, corpid: str = ""
) -> dict:
"""收消息 POST:从信封 XML 取 Encrypt → 验签 → 解密 → parse_message。"""
encrypt = ET.fromstring(body).findtext("Encrypt") or ""
if _signature(timestamp, nonce, encrypt) != msg_signature:
raise ValueError("签名校验失败")
return parse_message(_extract_plain(encrypt, expect_receiveid=corpid))

View File

@ -0,0 +1,63 @@
"""wechat_bot_bindings 表(ClawBot 个人微信绑定,DESIGN §8.7 渠道 A).
Revision ID: 0012
Revises: 0011
Create Date: 2026-06-24
新增独立表 wechat_bot_bindings 不碰现有 schema(公测兼容)一行 = 一个用户绑定其
个人微信 ClawBotbot_token / latest_context_token 存密文(core/wechat/crypto.py)
入站长轮询管理器按 status='active' 拉绑定起 getupdates 循环;主动推送用 latest_context_token
(24h 内有效) DESIGN §8.7 / core/storage/models.py
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects.postgresql import UUID as PG_UUID
revision: str = "0012"
down_revision: Union[str, None] = "0011"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"wechat_bot_bindings",
sa.Column(
"user_id", PG_UUID(as_uuid=True),
sa.ForeignKey("users.user_id", ondelete="CASCADE"), primary_key=True,
),
sa.Column("bot_token", sa.Text(), nullable=False),
sa.Column("bot_im_id", sa.Text(), nullable=True),
sa.Column("user_im_id", sa.Text(), nullable=True),
sa.Column(
"base_url", sa.Text(), nullable=False,
server_default="https://ilinkai.weixin.qq.com",
),
sa.Column("latest_context_token", sa.Text(), nullable=True),
sa.Column("context_token_at", sa.DateTime(timezone=True), nullable=True),
sa.Column(
"chat_task_id", PG_UUID(as_uuid=True),
sa.ForeignKey("tasks.task_id", ondelete="SET NULL"), nullable=True,
),
sa.Column("status", sa.Text(), nullable=False, server_default="active"),
sa.Column(
"created_at", sa.DateTime(timezone=True),
server_default=sa.func.now(), nullable=False,
),
sa.Column(
"updated_at", sa.DateTime(timezone=True),
server_default=sa.func.now(), nullable=False,
),
)
# 入站管理器扫 active 绑定起长轮询
op.create_index(
"ix_wechat_bot_bindings_active", "wechat_bot_bindings", ["status"],
)
def downgrade() -> None:
op.drop_index("ix_wechat_bot_bindings_active", table_name="wechat_bot_bindings")
op.drop_table("wechat_bot_bindings")

View File

@ -0,0 +1,42 @@
"""tasks.channel 列(渠道来源:web / wechat).
Revision ID: 0013
Revises: 0012
Create Date: 2026-06-24
tasks channel ,标记任务来源渠道:
- web = 网页端常规任务(默认)
- wechat = 微信 ClawBot 常驻对话(每用户一条)
只加列不动现有数据;server_default='web' 让历史行自动回填为 web建表后把
现网已存在的微信常驻 task(description = '(微信 ClawBot 对话)')backfill
'wechat',让置顶 / 徽章逻辑对存量数据立即生效
前端据 channel 给微信任务打徽章并后端强制置顶(列表查询排序前置 pin 表达式)
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
revision: str = "0013"
down_revision: Union[str, None] = "0012"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column(
"tasks",
sa.Column("channel", sa.Text(), nullable=False, server_default="web"),
)
# backfill 存量微信常驻 task —— 用建 task 时写死的 description 作标记。
op.execute(
"UPDATE tasks SET channel = 'wechat' "
"WHERE description = '(微信 ClawBot 对话)'"
)
def downgrade() -> None:
op.drop_column("tasks", "channel")

View File

@ -0,0 +1,44 @@
"""wecom_bindings 表(企业微信绑定,DESIGN §8.7 渠道 B,纯推送).
Revision ID: 0014
Revises: 0013
Create Date: 2026-06-24
新增独立表 wecom_bindings 不碰现有 schema(公测兼容)一行 = 一个用户的企业微信成员
userid(OAuth 扫码拿)应用凭据走全局 env不入库;userid 非密钥明文存 DESIGN §8.7
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects.postgresql import UUID as PG_UUID
revision: str = "0014"
down_revision: Union[str, None] = "0013"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"wecom_bindings",
sa.Column(
"user_id", PG_UUID(as_uuid=True),
sa.ForeignKey("users.user_id", ondelete="CASCADE"), primary_key=True,
),
sa.Column("wecom_userid", sa.Text(), nullable=False),
sa.Column("status", sa.Text(), nullable=False, server_default="active"),
sa.Column(
"created_at", sa.DateTime(timezone=True),
server_default=sa.func.now(), nullable=False,
),
sa.Column(
"updated_at", sa.DateTime(timezone=True),
server_default=sa.func.now(), nullable=False,
),
)
def downgrade() -> None:
op.drop_table("wecom_bindings")

View File

@ -0,0 +1,144 @@
"""channel_bindings 统一表(微信渠道抽象,DESIGN §8.7).
Revision ID: 0015
Revises: 0014
Create Date: 2026-06-24
0012 wechat_bot_bindings(ClawBot)+ 0014 wecom_bindings(企业微信)合成一张
判别列 + JSONB channel_bindings(user_id, channel, status, config),沿用本库
usage_events(kind+units)的多态范式 加渠道不再各建表
数据迁移:旧两表的行搬进 config JSONB(敏感 token 列本就是密文串,原样搬不重新加密),
drop 旧表DDL + DML 同一事务,失败整体回滚不丢数据 DESIGN §8.7
"""
import json
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects.postgresql import JSONB, UUID as PG_UUID
revision: str = "0015"
down_revision: Union[str, None] = "0014"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"channel_bindings",
sa.Column(
"user_id", PG_UUID(as_uuid=True),
sa.ForeignKey("users.user_id", ondelete="CASCADE"), primary_key=True,
),
sa.Column("channel", sa.Text(), primary_key=True), # clawbot | wecom | ...
sa.Column("status", sa.Text(), nullable=False, server_default="active"),
sa.Column("config", JSONB(), nullable=False, server_default="{}"),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now(), nullable=False),
sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now(), nullable=False),
)
# 入站管理器/推送:按 (channel, status) 扫某渠道活跃绑定
op.create_index("ix_channel_bindings_channel", "channel_bindings", ["channel", "status"])
conn = op.get_bind()
insert = sa.text(
"INSERT INTO channel_bindings (user_id, channel, status, config, created_at, updated_at) "
"VALUES (:uid, :ch, :st, CAST(:cfg AS JSONB), :ca, :ua)"
)
# 0012 wechat_bot_bindings → channel='clawbot'(token 列已是密文串,原样搬)
insp = sa.inspect(conn)
if insp.has_table("wechat_bot_bindings"):
rows = conn.execute(sa.text(
"SELECT user_id, bot_token, bot_im_id, user_im_id, base_url, "
"latest_context_token, context_token_at, chat_task_id, status, created_at, updated_at "
"FROM wechat_bot_bindings"
)).mappings().all()
for r in rows:
cfg = {
"bot_token": r["bot_token"],
"bot_im_id": r["bot_im_id"],
"user_im_id": r["user_im_id"],
"base_url": r["base_url"],
"latest_context_token": r["latest_context_token"],
"context_token_at": r["context_token_at"].isoformat() if r["context_token_at"] else None,
"chat_task_id": str(r["chat_task_id"]) if r["chat_task_id"] else None,
}
conn.execute(insert, {
"uid": r["user_id"], "ch": "clawbot", "st": r["status"],
"cfg": json.dumps(cfg), "ca": r["created_at"], "ua": r["updated_at"],
})
op.drop_table("wechat_bot_bindings")
# 0014 wecom_bindings → channel='wecom'
if insp.has_table("wecom_bindings"):
rows = conn.execute(sa.text(
"SELECT user_id, wecom_userid, status, created_at, updated_at FROM wecom_bindings"
)).mappings().all()
for r in rows:
cfg = {"wecom_userid": r["wecom_userid"]}
conn.execute(insert, {
"uid": r["user_id"], "ch": "wecom", "st": r["status"],
"cfg": json.dumps(cfg), "ca": r["created_at"], "ua": r["updated_at"],
})
op.drop_table("wecom_bindings")
def downgrade() -> None:
# 回滚:重建旧两表 + 把 config 拆回列,再 drop channel_bindings。
op.create_table(
"wechat_bot_bindings",
sa.Column("user_id", PG_UUID(as_uuid=True),
sa.ForeignKey("users.user_id", ondelete="CASCADE"), primary_key=True),
sa.Column("bot_token", sa.Text(), nullable=False),
sa.Column("bot_im_id", sa.Text(), nullable=True),
sa.Column("user_im_id", sa.Text(), nullable=True),
sa.Column("base_url", sa.Text(), nullable=False,
server_default="https://ilinkai.weixin.qq.com"),
sa.Column("latest_context_token", sa.Text(), nullable=True),
sa.Column("context_token_at", sa.DateTime(timezone=True), nullable=True),
sa.Column("chat_task_id", PG_UUID(as_uuid=True),
sa.ForeignKey("tasks.task_id", ondelete="SET NULL"), nullable=True),
sa.Column("status", sa.Text(), nullable=False, server_default="active"),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now(), nullable=False),
sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now(), nullable=False),
)
op.create_table(
"wecom_bindings",
sa.Column("user_id", PG_UUID(as_uuid=True),
sa.ForeignKey("users.user_id", ondelete="CASCADE"), primary_key=True),
sa.Column("wecom_userid", sa.Text(), nullable=False),
sa.Column("status", sa.Text(), nullable=False, server_default="active"),
sa.Column("created_at", sa.DateTime(timezone=True), server_default=sa.func.now(), nullable=False),
sa.Column("updated_at", sa.DateTime(timezone=True), server_default=sa.func.now(), nullable=False),
)
conn = op.get_bind()
rows = conn.execute(sa.text(
"SELECT user_id, channel, status, config, created_at, updated_at FROM channel_bindings"
)).mappings().all()
for r in rows:
cfg = r["config"] or {}
if r["channel"] == "clawbot":
conn.execute(sa.text(
"INSERT INTO wechat_bot_bindings (user_id, bot_token, bot_im_id, user_im_id, base_url, "
"latest_context_token, context_token_at, chat_task_id, status, created_at, updated_at) "
"VALUES (:uid, :bt, :bim, :uim, :bu, :lct, CAST(:cta AS timestamptz), "
"CAST(:cti AS uuid), :st, :ca, :ua)"
), {
"uid": r["user_id"], "bt": cfg.get("bot_token") or "", "bim": cfg.get("bot_im_id"),
"uim": cfg.get("user_im_id"), "bu": cfg.get("base_url") or "https://ilinkai.weixin.qq.com",
"lct": cfg.get("latest_context_token"), "cta": cfg.get("context_token_at"),
"cti": cfg.get("chat_task_id"), "st": r["status"],
"ca": r["created_at"], "ua": r["updated_at"],
})
elif r["channel"] == "wecom":
conn.execute(sa.text(
"INSERT INTO wecom_bindings (user_id, wecom_userid, status, created_at, updated_at) "
"VALUES (:uid, :wu, :st, :ca, :ua)"
), {
"uid": r["user_id"], "wu": cfg.get("wecom_userid") or "",
"st": r["status"], "ca": r["created_at"], "ua": r["updated_at"],
})
op.drop_index("ix_channel_bindings_channel", table_name="channel_bindings")
op.drop_table("channel_bindings")

View File

@ -0,0 +1,33 @@
"""users.name / users.user_name 列(平台登录注入的用户档案).
Revision ID: 0016
Revises: 0015
Create Date: 2026-06-25
users 加两列:name(显示名/姓名)+ user_name(平台账号名), nullable
平台经 /v1/auth/login(platform_key 形态) body 里注入,ensure_user_row upsert
落库;邮箱密码 / 历史行留空将来 OIDC 接管时由 ID token name / preferred_username
claim 注入,数据流不变 DESIGN §7.3 / §7.4
纯加列不动现有数据(平滑兼容线上存量行, NULL)
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
revision: str = "0016"
down_revision: Union[str, None] = "0015"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column("users", sa.Column("name", sa.Text(), nullable=True))
op.add_column("users", sa.Column("user_name", sa.Text(), nullable=True))
def downgrade() -> None:
op.drop_column("users", "user_name")
op.drop_column("users", "name")

View File

@ -0,0 +1,58 @@
"""tasks.scheduled_job_id 列(定时任务执行归属,DESIGN §8.5).
Revision ID: 0017
Revises: 0016
Create Date: 2026-06-26
tasks scheduled_job_id(nullable FK scheduled_jobs.job_id, ondelete SET NULL)
NULL = task 是某定时任务的一次执行(isolated 每次新建 / persistent 首次新建都填),
普通对话列表据此排除,不混进"用户项目"列表;job 软删不硬删,SET NULL 安全
backfill 存量定时执行 task:
- persistent:bound_task_id 直接指向其常驻 task 精确回填
- isolated:working_dir 末段 'scheduled-<job_id 前 8 位>' 8 位前缀匹配 job_id
匹配不上的孤行(job 已物理删等) NULL,由列表查询的 working_dir LIKE 兜底排除
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
from sqlalchemy.dialects.postgresql import UUID as PG_UUID
revision: str = "0017"
down_revision: Union[str, None] = "0016"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column(
"tasks",
sa.Column("scheduled_job_id", PG_UUID(as_uuid=True), nullable=True),
)
op.create_foreign_key(
"fk_tasks_scheduled_job_id",
"tasks", "scheduled_jobs",
["scheduled_job_id"], ["job_id"],
ondelete="SET NULL",
)
# persistent:bound_task_id 精确指向其常驻 task
op.execute(
"UPDATE tasks SET scheduled_job_id = j.job_id "
"FROM scheduled_jobs j "
"WHERE j.bound_task_id = tasks.task_id"
)
# isolated:working_dir 末段 scheduled-<8hex> 按 job_id 前 8 位匹配
op.execute(
"UPDATE tasks t SET scheduled_job_id = j.job_id "
"FROM scheduled_jobs j "
"WHERE t.scheduled_job_id IS NULL "
"AND t.working_dir ~ 'scheduled-[0-9a-f]{8}' "
"AND left(j.job_id::text, 8) = substring(t.working_dir from 'scheduled-([0-9a-f]{8})')"
)
def downgrade() -> None:
op.drop_constraint("fk_tasks_scheduled_job_id", "tasks", type_="foreignkey")
op.drop_column("tasks", "scheduled_job_id")

View File

@ -0,0 +1,29 @@
"""messages.kind 列(消息来源标记,避免 push 记录被 extract_last_assistant_text 误取).
Revision ID: 0018
Revises: 0017
Create Date: 2026-06-26
messages kind (nullable Text,默认 NULL)NULL=agent run 产生的消息;
"push"=push 记录(_record_push_to_chat )extract_last_assistant_text
WHERE kind IS NULL 跳过 push 记录,避免 wecom 入站取回复时误取 push 摘要
独立列不进 payload,不影响 agent 上下文 / LLM API纯加列,不动现有数据
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
revision: str = "0018"
down_revision: Union[str, None] = "0017"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column("messages", sa.Column("kind", sa.Text(), nullable=True))
def downgrade() -> None:
op.drop_column("messages", "kind")

View File

@ -0,0 +1,40 @@
"""tasks.context_base_idx 列(channel 长会话软重置,DESIGN §8.7).
Revision ID: 0019
Revises: 0018
Create Date: 2026-06-29
tasks context_base_idx(NOT NULL DEFAULT 0):喂给模型的上下文窗口起点
Session.load 只把 idx >= context_base_idx 的消息装进 LLM 上下文;idx < base 的历史
仍全量留在 messages (web `/messages` 直查不受影响,用户照旧翻完整历史)
channel 入站对话据此做软重置:超过 gap 阈值未说话 base 推到最后一条 user 消息
idx(保留上一轮原文做续聊锚点);手动新话题 base 推到总消息数(彻底从零)
存量行 / web 普通任务 base 0 = 喂全量,行为不变additive,无数据迁移
"""
from typing import Sequence, Union
import sqlalchemy as sa
from alembic import op
revision: str = "0019"
down_revision: Union[str, None] = "0018"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.add_column(
"tasks",
sa.Column(
"context_base_idx",
sa.Integer(),
nullable=False,
server_default="0",
),
)
def downgrade() -> None:
op.drop_column("tasks", "context_base_idx")

View File

@ -0,0 +1,66 @@
#!/usr/bin/env bash
# 在 sandbox 容器里实测 `chromium --headless --print-to-pdf`(md→HTML→PDF 的 PDF 那段)。
# 区分「chromium 缺包」「纯启动超时(/dev/shm 64MB)」「只读 rootfs 下 user-data-dir 写不了」。
# 用法(服务器上,任选其一):
# A) 进一个活着的 per-user 容器(最贴真,复用线上 64MB /dev/shm 默认):
# C=$(docker ps --filter "label=zcbot.product=sandbox" --format '{{.Names}}' | head -1)
# docker cp deploy/sandbox/probe_chromium_pdf.sh "$C":/tmp/probe.sh
# docker exec "$C" bash /tmp/probe.sh
# B) 没有活容器时,起一个临时的(显式 NOT 传 --shm-size,复现线上 64MB):
# docker run --rm --read-only --tmpfs /tmp:exec,size=512m,mode=1777 \
# --cap-drop=ALL --security-opt=no-new-privileges \
# --entrypoint bash zcbot-sandbox:latest /dev/stdin < deploy/sandbox/probe_chromium_pdf.sh
set -u
CR=""
for c in chromium chromium-browser /usr/bin/chromium; do
command -v "$c" >/dev/null 2>&1 && { CR="$c"; break; }
done
echo "===== /dev/shm size (期望线上 64M) ====="; df -h /dev/shm
echo "===== chromium 是否在 (缺包则这里就失败) ====="
[ -n "$CR" ] && "$CR" --version 2>&1 | head -1 || { echo "[FAIL] chromium 缺包/不可执行"; exit 1; }
# 测试输入:中文 + 表格背景色(print-color-adjust) + 化学式下标 + 超链接,覆盖简报常见元素
cd /tmp
cat > in.html <<'HTML'
<!DOCTYPE html><html lang="zh-CN"><head><meta charset="utf-8"><style>
@page { size: A4; margin: 2cm; }
body { font-family: 'Noto Sans CJK SC','Noto Serif CJK SC',serif; font-size:12pt; }
h1 { color:#C00000; border-bottom:2px solid #C00000; }
th { background:#C00000; color:#fff; -webkit-print-color-adjust:exact; print-color-adjust:exact; }
td,th { border:1px solid #999; padding:4pt 8pt; }
a { color:#1155CC; }
sub { font-size:0.75em; }
</style></head><body>
<h1>水泥科研方向 — 冒烟测试</h1>
<p>中文渲染、化学式 CO<sub>2</sub> / C<sub>3</sub>S、<a href="https://doi.org/10.1016/x">DOI 超链接</a>。</p>
<table><tr><th>期刊</th><th>篇数</th></tr><tr><td>Cement and Concrete Research</td><td>11</td></tr></table>
</body></html>
HTML
run() { # $1=label $2..=extra flags
local label="$1"; shift
local ts=$SECONDS
timeout 60 "$CR" --headless --disable-gpu --no-sandbox \
--user-data-dir=/tmp/cr-$label "$@" \
--print-to-pdf=/tmp/out-$label.pdf /tmp/in.html >"$label.log" 2>&1
local rc=$?
echo "rc=$rc 用时=$((SECONDS-ts))s"; tail -3 "$label.log"
if [ -s "/tmp/out-$label.pdf" ]; then
echo "[$label 出图] $(wc -c < /tmp/out-$label.pdf) bytes -> /tmp/out-$label.pdf"
else
echo "[$label 无图]"
fi
}
echo; echo "===== A: 漏 --disable-dev-shm-usage(线上 64MB /dev/shm)→ 可能挂起/超时 ====="
run A
echo; echo "===== B: 加 --disable-dev-shm-usage(走 /tmp)→ 预期成功出 PDF ====="
run B --disable-dev-shm-usage
echo; echo "===== 结论 ====="
echo "B 出图 => chromium print-to-pdf 可用,render_pdf.py 固定带 --disable-dev-shm-usage + --user-data-dir=/tmp/* 即可"
echo "B 无图/超时 => 看 B.log;若是 /dev/shm 仍报错,给 docker run 加 --shm-size"
echo "chromium 缺/全失败 => 更深环境问题,镜像没装好 chromium/字体"

View File

@ -1,6 +1,6 @@
# 科研智能助手 · 操作说明书(精简版) # 科研智能助手 · 操作说明书(精简版)
> 适用:无机非金属材料(水泥 / 混凝土 / 玻璃 / 陶瓷 / 耐火 / 新型建材)科研人员。从**登录后**讲起。 > 适用:无机非金属材料(水泥 / 混凝土 / 玻璃 / 陶瓷 / 耐火 / 新型建材)科研人员。
--- ---
@ -19,12 +19,11 @@
## 2. 界面:三栏 ## 2. 界面:三栏
``` | 左:任务列表 | 中:对话区 | 右:文件区 |
左:任务列表 中:对话区 右:文件区 |---|---|---|
+ 新建任务 与助手一问一答 当前工作目录的文件 | + 新建任务 | 与助手一问一答 | 当前工作目录的文件 |
搜索 / 筛选 实时显示进度 上传 / 预览 / 下载 | 搜索 / 筛选 | 实时显示进度 | 上传 / 预览 / 下载 |
技能 / 记忆 底部输入框 | 技能 / 记忆 | 底部输入框 | |
```
![](assets/image-1781662287932-1.png) ![](assets/image-1781662287932-1.png)

View File

@ -0,0 +1,428 @@
# 科研 AI 双智能体 · 汇报 PPT 大纲
> 单位:中国建筑材料科学研究总院 · 中存大数据
> 用途:生成汇报 PPT 的内容底稿。本文件只定**结构 + 每页要点 + 呈现形式**,不写大段叙述文字。
> 编写日期:2026-06-24
---
## 0. 总体设计说明(给设计 / 制作人员看)
**叙事主线 —— 通用 + 垂直,双轮驱动:**
| | 第一部分 | 第二部分 |
|---|---|---|
| 名称 | 通用科研辅助智能体 | 无机非金属材料自主研发智能体 |
| 定位 | **横向**:服务全院科研人员日常全流程 | **纵深**:材料配方自主研发的自动化 |
| 入口 | 自然语言,任意科研任务 | 材料研发需求 → 实验方案/配方 |
| 形态 | 17 项 skill 能力矩阵 + 可交付物 | 五大引擎 + 配方大模型(垂直微调) |
| 一句话 | 把"想法"变成可交付的科研产物 | 把"性能要求"变成可执行的实验配方 |
**呈现纪律(全程硬约束):**
- 每页**论断式标题**(写结论,不写"XX 介绍")。
- 正文只用:**短卡片(≤12 字)/ KPI 数字卡 / 流程图 / 时间轴 / 对比表 / 矩阵网格**。禁止整段话。
- 每页带一行【呈现形式】,指明该页用什么版式画。
- 颜色:商务红主题(主色 #C00000),关键数字 / 核心步骤高亮。
- 凡是带"流程"的页,一律画成**节点+箭头流程图**,不写成文字列表。
**全篇页序(约 26 页):** 封面 → 双智能体总览 → [PART1:1.01.10] → [PART2:2.02.10] → 总结 → 展望/交流。
---
## 封面
- 主标题:**科研 AI 双智能体**
- 副标题:通用科研辅助智能体 · 无机非金属材料自主研发智能体
- 落款:中国建筑材料科学研究总院 · 中存大数据 / 2026
【呈现形式】杂志级背景图 + 居中大标题;底部一行四个关键词:自然语言驱动 / 全流程可交付 / 垂直配方大模型 / 统一安全底座。
---
## 总览页 · 一张图看懂两个智能体
**论断:一个横向赋能全院,一个纵向攻坚配方 —— 通用 + 垂直,双轮驱动。**
左右两张大卡:
- 左卡「通用科研辅助智能体」:自然语言入口 · 17 skill · 内部 100 万+ 文献库 · 直出 Word/PPT/图表
- 右卡「材料自主研发智能体」:五大引擎 · 智能实验设计 · 配方大模型(LoRA 微调) · 预测→配方闭环
- 中间用箭头/底座连接:**共享统一底座**(多模型调度 · 向量知识库 · 安全沙盒 · 训练流水线)
【呈现形式】左右双卡 + 下方一条横贯"统一底座"长条。这页是全场的"地图",后面两部分都回指这张图。
---
# 第一部分 · 通用科研辅助智能体
## 1.0 章节分隔页
- PART 01
- **通用科研辅助智能体**
- 副题:以自然语言为入口,把科研任务串成可交付的工作流
【呈现形式】章节封面页,大序号 + 标题 + 一句定位。
---
## 1.1 它是什么 —— 现有功能总览
**论断:不止"问答",而是能自己动手、直接交付成果的科研智能体。**
四张能力卡 + 一行数字条:
- **自然语言驱动**:描述需求 → 自动识别意图、动态挂载专业能力
- **产出可交付物**:直接生成 Word / PPT / 图表 / 数据,贴合科研与申报格式
- **全流程覆盖**:调研 — 计算 — 写作 — 评审,一个智能体串起,无需多工具切换
- **统一底座**:多模型调度 · 安全沙盒 · 长期记忆 · 长任务断点恢复
数字条(KPI):**17** 项专业 skill · **6** 大能力类别 · 内部 **100 万+** 篇材料文献库 · **多渠道**接入(网页/微信/定时)
【呈现形式】2×2 能力卡网格 + 底部一条 KPI 数字条(4 个数字)。
---
## 1.2 它怎么工作 —— 五步工作流
**论断:意图识别 → 动态挂载能力 → 沙盒内执行 → 关键节点人工确认 → 规范化成果。**
横向五段流程:
1. **自然语言需求**(用户提出)
2. **意图识别**(自动挂载对应 Skill)
3. **工具调用循环**(安全沙盒内自主迭代:思考→调用工具→观察)
4. **人工确认**(关键决策由用户拍板,过程可追溯)
5. **规范化成果**(Word · PPT · 图表 · 数据)
底部一条"统一底座支撑":多模型调度 / 安全沙盒隔离 / 个人文件库 / 长期记忆·断点恢复
【呈现形式】横向 5 节点流程图(箭头串联)+ 底部一条底座长条,做成主图、放大。
---
## 1.3 能力矩阵 —— 科研全流程 Skill 体系
**论断:17 项专业能力,按科研全流程六大类组织,可持续扩展。**
六张分类卡(每卡:类名 + 含的 skill + 一句话):
- **科研写作**:proposal 申报书 / paper 论文 / standard 标准 / patent 专利 / review 审稿 —— 立项到评审全链路
- **文献检索**:documents 内部库 / research 全网 / brief 方向简报 —— 可溯源文献支撑
- **科研计算**:pymatgen 晶体计算 / stats_ml 配方建模 —— "配比→性能"预测寻优
- **演示出图**:ppt 商务级幻灯 / plot_pub 出版级学术图 —— 能看、能讲、能投稿
- **通用元能力**:analyze 问题拆解 / coding 代码实现
- **可定制**:skill-creator 用户私有 skill(从零写或 fork 内置再改)
【呈现形式】2×3 卡片网格,每卡一个图标。下面五页对其中"标志性"能力各展开一页。
> 说明:内容生成(文生图/文生视频)本次汇报不展开,不单列页。
---
## 1.4 标志性能力 ① 文献检索 —— 内部百万级材料文献库
**论断:中文提问,命中英文文献 —— 100 万+ 篇材料学科论文,可溯源。**
主体两块:
- **七大学科库**(卡片/六边形网格,各一行):胶凝材料 · 陶瓷基 · 玻璃基 · 晶体 · 复合 · 耐火 · 检验检测
- **三路检索分工**(小流程):
- `documents` 内部库:100 万+ 英文论文,已 Markdown 化(LLM 直读),**跨语言语义检索**
- `research` 全网:OpenAlex 元数据 + DOI + PDF 下载
- `brief` 方向简报:重要论文列表 + 内容总结,520 分钟掌握一个方向
差异化标签(高亮):**跨语言检索** · **可溯源引用** · **立项依据有真实文献支撑**
【呈现形式】上方七学科库网格,下方三路检索分工小图;右侧竖排三个差异化标签 pill。
---
## 1.5 标志性能力 ② 项目申报 —— proposal
**论断:把课题信息变成可提交的申报书,评审雷区与文献真实性内置兜底。**
能力卡(短):
- **6 类基金骨架**:重点研发 / 重大专项 / 国自然面上·青年 / 联合基金 / 省地方 / 横向
- **评审雷区清单** + "不可考核词"过滤
- **文献真实性铁律**:不允许编造引文(GB/T 7714 顺序编码)
- **自动化产出**:间接费用台阶 + 经费表自动生成 · 技术路线图自动渲染插图
- **一段一卡**:关键章节逐段确认,不一口气出全文
产物:带目录 + 自动图题 + 图表编号的 `.docx`
【呈现形式】左侧"6 类基金"卡片网格,右侧"需求 → 一段一卡起草 → 渲染 docx"竖向流程;底部一条产物预览缩略。
---
## 1.6 标志性能力 ③ 科研写作全家桶 —— 论文 / 标准 / 专利 / 审稿
**论断:从论文到标准、专利、审稿 —— 写作全链路,反 AI 幻觉是底线。**
四象限卡(每卡:skill + 输入→产物):
- **paper 论文**:实验数据 → 中文核心 / 英文 SCI 投稿稿(IMRaD + 引文三角核验)
- **standard 标准**:材料/方法 → 国标 / 行标 / 团标 + 编制说明(GB/T 1.1—2020)
- **patent 专利**:项目素材 → 发明专利技术交底书(供代理师转写)
- **review 审稿**:已有稿 → 问题表 + 修改稿(长文分段深审)
横贯亮点条(高亮):**引文三角核验** —— 存在性 → 三角印证 → 支撑度,编造引文**零容忍**。
【呈现形式】2×2 象限卡 + 底部一条横贯"引文三角核验"亮点带。
---
## 1.7 标志性能力 ④ 材料计算 —— pymatgen + stats_ml
**论断:从晶体结构到配方建模 —— 服务"配比 → 性能"的预测与寻优。**
左右两栏:
- **pymatgen 无机材料计算**:晶体结构 I/O · XRD 模拟 · 相图 · 对称性 · Materials Project;**中文相名映射**(C₃S / 钙矾石 / 莫来石 / 方镁石 → 化学式)
- **stats_ml 配方-性能建模**:三库分工(sklearn 预测 / statsmodels 假设检验·p值 / PyMC 小样本贝叶斯);DoE 响应面 · 强度预测 · 异常配方聚类
典型场景标签:XRD 谱图模拟 · TG-DSC 双轴 · 强度预测 · 响应面寻优
【呈现形式】左右双栏卡,每栏配 23 个典型场景小图标;高亮"中文相名映射"和"三库分工"。
---
## 1.8 标志性能力 ⑤ 演示出图 —— ppt + plot_pub
**论断:成果"能看、能讲、能投稿" —— 商务级幻灯 + 出版级学术图。**
左右两块:
- **ppt 商务级演示**:卡片式视觉系统 · 论断式标题 · 信息设计纪律 · 一键整建 deck(原生可编辑)
- **plot_pub 出版级学术图**:中文 + viridis + 矢量(SVG/PDF)· 投稿级复合图设计纪律(XRD 叠图 / TG-DSC 双轴 / 多 panel)
价值标签:贴合期刊投稿(Cement and Concrete Research 等)· 降低整理排版成本
【呈现形式】左右两个产物缩略(一张 PPT 卡片样张 + 一张学术图样张)做观感对比。
---
## 1.9 平台技术架构(架构师视角)
**论断:Less Scaffolding, More Trust —— 把 LLM 当会持续变强的同事,给目标不给步骤。**
四象限架构卡:
- **① 智能体内核**:ReAct 工具调用循环(思考→调用→观察自主迭代)+ 进展守卫(重复调用/空转自动收敛)+ 阶段化编排嵌人工确认
- **② Skill 动态加载**:意图识别按需挂载,不相关能力不进上下文(渐进披露,省算力)+ 可扩展插件(流程+模板+脚本)
- **③ 安全沙盒**:每用户 Docker 容器隔离 · 资源限额 + 网络管控 + 最小权限 + 丰富工具集 / MCP
- **④ 模型·知识·记忆底座**:多模型自由调度(DeepSeek/Qwen + OpenAI 接口,涉密切内网)· RAG 抑制幻觉 · 双层长期记忆 + 长任务断点恢复
底部技术栈条:FastAPI(异步后端 + 原生 SSE)· LiteLLM(多模型统一接入,OpenAI 兼容)· 自研 ReAct 内核 · PostgreSQL(任务/消息 append-only)· Docker(每用户沙盒)· Skill 渐进披露体系
【呈现形式】2×2 架构象限卡 + 底部技术栈 pill 条,每条压成一句。
---
## 1.10 多渠道接入与产品化
**论断:不只是网页 —— 微信对话、定时任务,把智能体送到用户身边。**
三张卡:
- **网页工作台**:三栏 SPA(任务 / 对话 / 文件),消息目录导航、方案确认卡、文件预览
- **微信接入**:个人微信对话即可用,可主动推送简报/结果
- **定时任务**:"每天 X 点干 Y" —— 跑 skill 出简报 / 发邮件,自然语言建任务
【呈现形式】三卡横排,各配渠道图标。
---
# 第二部分 · 无机非金属材料自主研发智能体
## 2.0 章节分隔页
- PART 02
- **无机非金属材料自主研发智能体**
- 副题:水泥基配方大模型 —— 从"性能要求"到"实验配方"的自动化
【呈现形式】章节封面页。承上启下一句:从通用辅助,进入材料研发深水区。
---
## 2.1 五大引擎 —— 一图看全
**论断:五大引擎协同,构成材料研发的智能中枢。**
五个引擎卡(每卡:名称 + 一句≤10 字功能 + 图标):
1. **智能问答中枢**:统一入口,多轮+工具+文件问答
2. **知识库构建**:非结构化文档 → 可检索知识资产
3. **知识库问答**:RAG 结合企业知识,引用溯源
4. **AI 文档分类**:自动归档 + 触发向量重建
5. **智能实验设计**:需求 → 可执行配方(旗舰)
【呈现形式】五卡环形/总线布局,中心写"配方大模型";第 5 个引擎高亮(2.7 展开)。后面 2.32.7 逐个引擎各一页。
---
## 2.2 总体架构图(分层框图)
**论断:应用层 → 五大引擎 → 模型与向量层 → 训练模块,标准接口协同。**
四层框图:
- **User**:业务系统 / 请求
- **Backend 五大引擎**:Chat / KBBuild / KBQA / DocAI / Lab(**LangGraph 编排**复杂逻辑与实验设计流)
- **模型与数据层**:LLM(DeepSeek/Qwen) · Qwen2.5-VL 视觉 · BGE-M3 向量 · Milvus 向量库 · MinerU 解析
- **Train 训练模块**:LLaMA Factory → LoRA → 行业配方模型
【呈现形式】自上而下四层分层框图,层间箭头标接口(RAG / Embedding / LoRA)。只画框和箭头,不写段落。
---
## 2.3 引擎 ① 智能问答中枢
**论断:大模型统一入口 —— 从"回答问题"升级为"执行任务"。**
工作流程(流程图):
用户问题 → 会话与权限处理 → 任务识别 → **是否需要外部能力?**
- 否 → 普通问答 / 文件上下文 → LLM 生成
- 是 → 工具能力 → 读取文档 / MCP 工具调用
→ SSE 流式返回回答
技术卡(短):LangGraph 编排 · DeepSeek V3.1 / Qwen3-30B-A3B · 文件问答 + 多轮 + 思考模式 · MCP 接入外部系统 · SSE 流式输出
价值标签:统一标准化问答 · 高扩展集成业务工具 · 可升级为执行任务
【呈现形式】左侧带分支判定的流程图(菱形判定)+ 右侧技术卡 + 底部价值 pill。
---
## 2.4 引擎 ② 知识库构建
**论断:把分散的非结构化文档,沉淀为可检索、可引用、可追溯的企业知识资产。**
工作流程(流程图):
上传原始文档 → MinerU 解析 → **是否含图片/图表/扫描件?**
- 是 → Qwen2.5-VL 视觉解析 ↘
→ 文本结构化 & 生成 Markdown → 文本切分 → BGE-M3 向量化写入 Milvus → 保存文档元数据
支持内容卡(三类):
- **文档类**:PDF / Word / PPT / Excel
- **图像类**:图片 / 扫描件 / 图表
- **文本类**:Markdown / TXT / CSV / JSON
价值标签:分散资料 → 结构化知识库 · 为问答/实验/训练提供高质量数据基础
【呈现形式】上方带分支的处理流程图 + 下方三类支持内容卡。
---
## 2.5 引擎 ③ 知识库问答
**论断:基于 RAG 结合企业内部知识作答,引用可溯源,显著抑制幻觉。**
工作流程(流程图):
用户问题 → 问题理解 → 生成检索问题 → BGE-M3 向量化 → Milvus 检索 → 组装引用上下文 → 生成答案与溯源
技术卡(短):RAG 检索增强 · BGE-M3 向量化 + Milvus 检索 · DeepSeek/Qwen 结合上下文生成 · 引用来源溯源 · 多维度检索过滤
价值标签:提升专业性/准确性/可追溯 · 赋能私有文档深度问答 · 降低大模型幻觉风险
【呈现形式】横向 7 节点检索流程图(主色高亮"Milvus 检索"与"溯源")+ 右侧技术卡。
---
## 2.6 引擎 ④ AI 文档分类
**论断:自动识别领域与材料分类并归档,触发向量重建 —— 知识治理自动化。**
工作流程(流程图,含闭环):
待分类文档 → 读取解析内容 → 领域预判 → 构建分类体系 → 大模型分类 → 分类结果校验 → 保存 → **是否需调整归属?**
- 是 → 迁移文档并重建向量 → 完成归档
智能输出卡:摘要 · 领域 · 分类路径 · 判定依据 · 置信度
价值标签:降低人工整理归档成本 · 归入正确体系提升检索效率 · 为行业模型筛选标准化数据集
【呈现形式】带回环箭头的闭环流程图 + 一张"智能输出 5 字段"卡。
---
## 2.7 引擎 ⑤ 智能实验设计 —— 核心工作流(旗舰)
**论断:多阶段工作流,把研发需求转成可执行实验配方;核心一步是调用行业微调模型。**
横向时间轴,11 步压成 6 个阶段(核心步高亮):
1. **问题提炼**(科学问题 + 检索分类匹配 + 方向确认)
2. **文献检索分析**(向量库召回 + 逐篇提取实验参数)
3. **初步方案**(融合目标与文献,生成思路框架)
4. **学术评估优化**(多维量化评估,迭代优化路径)
5. ⭐ **配方生成**(调用 Qwen2.5-1.5B LoRA 行业模型 → 原料/配比/条件)
6. **校验 + 用户确认 + 实验工单**(人机协同闭环 → 对接实验室)
【呈现形式】横向 6 段时间轴/泳道,第 5 段(配方生成)用主色高亮放大;标注"人工确认节点"。
---
## 2.8 配方大模型训练 —— 配置与成效
**论断:LLaMA Factory + Qwen2.5-1.5B + LoRA,16 条实测数据完成首版训练。**
左:训练配置卡(短):
- 框架 / 基座:**LLaMA Factory + Qwen2.5-1.5B-Instruct**
- 微调:**PEFT + LoRA**(冻结主干,仅训低秩矩阵)
- 任务:**SFT** 建立"性能要求 → 配方组成"映射
- 数据:**16 组**实验室实测(输入 3d/7d 抗压抗折 → 输出 矿粉/电石渣/脱硫石膏/粉煤灰/水/减水剂 配比)
右:KPI 数字卡网格 + loss 曲线示意:
- 可训练参数占比 **4.57%**(7386 万 / 16.18 亿)
- Loss **0.6897 → 0.0073**(降 **98.9%**)
- 训练轮数 **50** Epochs
- 优化策略:禁用 KV Cache · 梯度检查点 · Torch SDPA 加速
成效三标签:收敛稳定 · 捕捉"低强度→低掺量"行业规律 · 标准化配方输出
【呈现形式】左配置卡 + 右 KPI 网格(4 个大数字)+ 一条 loss 下降曲线示意。
---
## 2.9 现状与下一步 —— 局限与优化路线
**论断:首版受 16 条数据所限偏"记忆";分三阶段补数据、简空间、建闭环。**
左右对比:
- **左 · 当前局限**:
- 数据仅 16 条 → 模型偏"记忆样本",未真正"理解规律"
- 泛化受限 → 未见性能区间配方精度有波动
- **右 · 优化路线**(P0/P1/P2 路线条):
- **P0** 扩充数据集至 **200+**(从记忆升级为理解)
- **P1** 简化配方空间(精简冗余材料,降学习维度)
- **P2** 搭建"预测–实验–反馈"闭环,目标达标率 **≥85%**
【呈现形式】左侧两张"痛点"卡(冷色),右侧 P0→P1→P2 路线时间轴(暖色/主色)。
---
## 2.10 模型矩阵 —— 通用 + 垂直双轮
**论断:通用基座 + 视觉/向量 + 垂直 LoRA 配方模型,打通"解析→沉淀→决策"。**
六行场景表(场景 | 模型 | 用途):
| 场景 | 模型 | 用途 |
|---|---|---|
| 智能问答中枢 | DeepSeek V3.1 / Qwen3-30B-A3B | 通用问答、文件问答、工具调用 |
| 知识库构建 | Qwen2.5-VL + BGE-M3 + Milvus | 文档解析、图表提取、向量入库 |
| 知识库问答 | DeepSeek V3.1 + BGE-M3 + Milvus | RAG 精准问答 + 原文溯源 |
| AI 文档分类 | Qwen3-30B-A3B + BGE-M3 | 自动识别主题、分类归档 |
| 智能实验设计 | 通用大模型 + Qwen2.5-1.5B(LoRA) | 分析文献、生成配方方案 |
| 配方模型训练 | Qwen2.5-1.5B 基座 + BGE-M3 | 学习"性能-配方"映射 |
【呈现形式】六行卡片表(非密集文字表);右侧一句"通用 + 垂直双轮驱动"呼应总览页。
---
# 结尾
## 总结 —— 双智能体落地成效
**论断:一横一纵双智能体已落地,共享统一底座。**
四张成果卡:
- **通用智能体**:17 项 skill · 内部 100 万+ 文献库 · 全流程可交付(Word/PPT/图表)
- **垂直智能体**:五大引擎 · 智能实验设计 · 配方大模型首版(Loss 收敛 0.0073)
- **统一底座**:多模型调度 · 向量知识库 + RAG · 每用户安全沙盒 · 训练流水线 + LoRA 微调
- **业务价值**:打通"数据 → 知识 → 决策"闭环,知识沉淀为可复用资产,支撑研发提效
【呈现形式】2×2 成果卡,关键数字高亮。
---
## 展望 / 交流
- 下一阶段:配方数据集 16 → 200+ · 简化配方空间 · 建"预测–实验–反馈"闭环(达标率 ≥85%)· 持续扩展 skill 与渠道
- **感谢聆听 · 欢迎交流**
【呈现形式】左侧 34 条展望短句(带图标),右侧大字"感谢聆听 / 交流环节"。

22
main.py
View File

@ -209,10 +209,24 @@ def user_role(email: str, role: str) -> None:
help="监听端口") help="监听端口")
@click.option("--reload/--no-reload", default=False, @click.option("--reload/--no-reload", default=False,
help="dev:文件改动自动重启(uvicorn 工厂模式)") help="dev:文件改动自动重启(uvicorn 工厂模式)")
def web(host: str, port: int, reload: bool) -> None: @click.option("--ssl-certfile", default=None,
"""启动 Web 服务(JSON API + dev SPA)。Auth 需 PLATFORM_KEY / JWT_SECRET env。""" help="TLS 证书链(fullchain.pem);与 --ssl-keyfile 同时给即在本端口跑 HTTPS")
@click.option("--ssl-keyfile", default=None,
help="TLS 私钥(privkey.pem)")
def web(host: str, port: int, reload: bool,
ssl_certfile: str | None, ssl_keyfile: str | None) -> None:
"""启动 Web 服务(JSON API + dev SPA)。Auth 需 PLATFORM_KEY / JWT_SECRET env。
HTTPS:`--ssl-certfile <fullchain.pem> --ssl-keyfile <privkey.pem>`(uvicorn 原生 TLS,
无需 nginx)两者都不给 = 明文 HTTP(默认,向后兼容)
"""
import uvicorn import uvicorn
# 两者都给才算启用 TLS;只给其一报错提醒(避免半配置悄悄退回 http)
if bool(ssl_certfile) ^ bool(ssl_keyfile):
raise click.UsageError("--ssl-certfile 与 --ssl-keyfile 必须同时提供")
tls = {"ssl_certfile": ssl_certfile, "ssl_keyfile": ssl_keyfile} if ssl_certfile else {}
# timeout_graceful_shutdown=5:SIGTERM 后 uvicorn 至多等 5s 关掉在连的 HTTP 请求 # timeout_graceful_shutdown=5:SIGTERM 后 uvicorn 至多等 5s 关掉在连的 HTTP 请求
# (主要是长连 SSE GET,断开后客户端会重连,run 不受影响),再进 lifespan shutdown # (主要是长连 SSE GET,断开后客户端会重连,run 不受影响),再进 lifespan shutdown
# 跑真正的 run drain(见 web/app.py finally + config/agent.yaml `shutdown` 段)。 # 跑真正的 run drain(见 web/app.py finally + config/agent.yaml `shutdown` 段)。
@ -221,11 +235,11 @@ def web(host: str, port: int, reload: bool) -> None:
# reload 模式需要 import string + factory,uvicorn 才能监听文件 # reload 模式需要 import string + factory,uvicorn 才能监听文件
uvicorn.run("web.app:create_app", host=host, port=port, uvicorn.run("web.app:create_app", host=host, port=port,
reload=True, factory=True, log_level="info", reload=True, factory=True, log_level="info",
timeout_graceful_shutdown=5) timeout_graceful_shutdown=5, **tls)
else: else:
from web.app import create_app from web.app import create_app
uvicorn.run(create_app(), host=host, port=port, log_level="info", uvicorn.run(create_app(), host=host, port=port, log_level="info",
timeout_graceful_shutdown=5) timeout_graceful_shutdown=5, **tls)
# ─────────────── Sandbox(Stage C 部署前置对账) ─────────────── # ─────────────── Sandbox(Stage C 部署前置对账) ───────────────

View File

@ -40,6 +40,7 @@
- 工具结果带 `[Error ...]` 时,先想清楚原因再重试,不要盲目重复同一调用 - 工具结果带 `[Error ...]` 时,先想清楚原因再重试,不要盲目重复同一调用
- 不臆造 API、文献、数据 —— 不知道就 read 源码 / 让用户提供 / 明说不知道 - 不臆造 API、文献、数据 —— 不知道就 read 源码 / 让用户提供 / 明说不知道
- 少来回:多个**互相独立、不依赖中间结果**的操作(建多页产物、批量改文件、生成整份 deck/文档)合到一个脚本或一轮(并发多 tool call)里做,别一步一个 tool call —— 每轮来回都重发整段上下文,轮数是 token 体量的线性乘数;但**下一步输入要看上一步结果**时(探索性检索、按报错改、需用户确认方向)就老实分步,别硬批 - 少来回:多个**互相独立、不依赖中间结果**的操作(建多页产物、批量改文件、生成整份 deck/文档)合到一个脚本或一轮(并发多 tool call)里做,别一步一个 tool call —— 每轮来回都重发整段上下文,轮数是 token 体量的线性乘数;但**下一步输入要看上一步结果**时(探索性检索、按报错改、需用户确认方向)就老实分步,别硬批
- 大块输出别反复灌进上下文:`run_python`/`shell` 打印的大段结果(整批文献摘要、长文件全文、大 JSON)会进对话历史并**每轮重发**,同一批数据 print 两三次上下文就滚雪球。中间数据**落文件**(如 `<task_dir>/scripts/data.json`、`evidence.md`),之后**只 `read` 用得上的片段**,别为"再看一眼"把整批重新打印 —— 既烧 token 又可能撑爆窗口 / 拖到超时被掐断
## 路径 ## 路径
默认工作目录见系统消息末尾,相对路径都基于它。 默认工作目录见系统消息末尾,相对路径都基于它。

11
rendering/__init__.py Normal file
View File

@ -0,0 +1,11 @@
"""平台渲染层:把 sections/*.md(或单 .md)渲染成 docx / pdf。
不是 skill 内容,**平台能力** skill 通过 `render.py` CLI 调用,自身不再 bundle
渲染脚本( fork skill 不受影响)随镜像 bind-mount `/sandbox/rendering`
- common.py 叶子原语(字体/化学式白名单/块级正则/表格行切分/图片路径), profile 单一事实源
- docx_manuscript.py paper 投稿稿 + proposal 申报书(配置化双 profile)
- docx_brief.py brief 简报(商务红 + 引文上标超链 + callout)
- pdf.py mdHTML沙盒 chromium --print-to-pdf
- render.py 统一入口:--profile {brief,paper,proposal} --format {docx,pdf}
"""

143
rendering/common.py Normal file
View File

@ -0,0 +1,143 @@
"""平台渲染层 · 共享叶子原语(docx 三 profile + 部分 pdf 复用)。
**真正同源 profile 无关**的底层件:字体 OOXML 助手化学式下标白名单
内联/块级 markdown 正则表格行切分图片路径解析三套 docx profile
(manuscript=paper/proposalbrief) import 这里,**单一事实源**
改化学式白名单 / 字体规范只动这一处,不再三处各拷一份
历史:原先 skills/{brief,paper,proposal}/scripts/render_docx.py 各自带一份
拷贝(_CHEM_RE 三份逐字相同易漏改)2026-06 抽到平台层 rendering/
"""
from __future__ import annotations
import re
from pathlib import Path
from docx.oxml import OxmlElement
from docx.oxml.ns import qn
from docx.shared import Cm, Pt
# ───────────────────────── 字体 OOXML 助手 ─────────────────────────
def set_run_fonts(run, *, cn_font: str = "宋体", en_font: str = "Times New Roman") -> None:
"""同时设置 run 的中文 (eastAsia) 和西文 (ascii/hAnsi) 字体。"""
rPr = run._element.get_or_add_rPr()
rFonts = rPr.find(qn("w:rFonts"))
if rFonts is None:
rFonts = OxmlElement("w:rFonts")
rPr.append(rFonts)
rFonts.set(qn("w:eastAsia"), cn_font)
rFonts.set(qn("w:ascii"), en_font)
rFonts.set(qn("w:hAnsi"), en_font)
def set_style_fonts(style, *, cn_font: str = "宋体", en_font: str = "Times New Roman") -> None:
"""直接给 style 写 rFonts, 基于该 style 的所有段落都继承字体。"""
el = style.element
rPr = el.find(qn("w:rPr"))
if rPr is None:
rPr = OxmlElement("w:rPr")
el.insert(0, rPr)
rFonts = rPr.find(qn("w:rFonts"))
if rFonts is None:
rFonts = OxmlElement("w:rFonts")
rPr.append(rFonts)
rFonts.set(qn("w:eastAsia"), cn_font)
rFonts.set(qn("w:ascii"), en_font)
rFonts.set(qn("w:hAnsi"), en_font)
def set_subscript(run) -> None:
rPr = run._element.get_or_add_rPr()
va = OxmlElement("w:vertAlign")
va.set(qn("w:val"), "subscript")
rPr.append(va)
# ───────────────────────── 内联 markdown 切分 ─────────────────────────
# 顺序敏感:**bold** 必须先于 *italic* 匹配, 否则会被 italic 抢
INLINE_RE = re.compile(
r"(?P<bold>\*\*(?P<bold_t>[^*\n]+?)\*\*)"
r"|(?P<italic>(?<![\*\w])\*(?P<italic_t>[^*\n]+?)\*(?!\*))"
r"|(?P<code>`(?P<code_t>[^`\n]+?)`)"
)
def parse_inline(text: str) -> list[tuple[str, str]]:
"""切成 (style, segment) 列表; style ∈ plain/bold/italic/code。"""
out: list[tuple[str, str]] = []
pos = 0
for m in INLINE_RE.finditer(text):
if m.start() > pos:
out.append(("plain", text[pos:m.start()]))
if m.group("bold"):
out.append(("bold", m.group("bold_t")))
elif m.group("italic"):
out.append(("italic", m.group("italic_t")))
elif m.group("code"):
out.append(("code", m.group("code_t")))
pos = m.end()
if pos < len(text):
out.append(("plain", text[pos:]))
return out or [("plain", text)]
# ── 化学式下标白名单(三 profile 共用同一份;单一事实源)──
# 长的在前,\b 防误伤 LC3 / C595 / 2026;不收 Ca2+ 这类带电荷的(那是上标,白名单不收即天然避开)
CHEM_RE = re.compile(
r"Ca\(OH\)2|Mg\(OH\)2"
r"|\b(?:Al2O3|Fe2O3|Fe3O4|Mn2O3|Cr2O3|P2O5|Na2SO4|K2SO4|CaSO4|CaCO3|MgCO3|"
r"CaCl2|MgCl2|Na2O|K2O|SiO2|TiO2|ZrO2|SO4|SO3|SO2|CO3|CO2|NO3|NO2|PO4|"
r"H2O|NH3|CH4|C4AF|C3S2|C2AS|C3S|C2S|C3A|O2|N2|H2)\b"
)
# ───────────────────────── 块级行类型正则 ─────────────────────────
HEADING_RE = re.compile(r"^(#{1,6})\s+(.+)$")
TABLE_LINE_RE = re.compile(r"^\s*\|.*\|\s*$")
BLOCKQUOTE_RE = re.compile(r"^\s*>\s?")
HR_RE = re.compile(r"^\s*-{3,}\s*$|^\s*={3,}\s*$|^\s*_{3,}\s*$")
FENCE_RE = re.compile(r"^\s*(`{3,}|~{3,})\s*(\S*)\s*$")
IMAGE_LINE_RE = re.compile(r"^\s*!\[(?P<cap>[^\]]*)\]\((?P<src>[^)\s]+)\)\s*$")
def is_table_line(line: str) -> bool:
return bool(TABLE_LINE_RE.match(line))
def is_heading(line: str) -> bool:
return bool(HEADING_RE.match(line))
def is_blockquote(line: str) -> bool:
return bool(BLOCKQUOTE_RE.match(line))
def is_hr(line: str) -> bool:
return bool(HR_RE.match(line))
# ───────────────────────── 表格行切分 ─────────────────────────
def split_md_row(line: str) -> list[str]:
return [c.strip() for c in line.strip().strip("|").split("|")]
def is_separator_row(cells: list[str]) -> bool:
return all(re.match(r"^[-:\s]+$", c) for c in cells if c != "")
# ───────────────────────── 图片 ─────────────────────────
MAX_IMG_WIDTH = Cm(15)
def resolve_image_path(src: str, base_dir: Path) -> Path | None:
"""图片相对路径以 base_dir (单个 .md 所在目录) 为锚。"""
p = Path(src)
if not p.is_absolute():
p = (base_dir / p).resolve()
return p if p.is_file() else None

View File

@ -1,21 +1,12 @@
"""把 sections/*.md 渲染成科研方向简报 .docx(简报体例,区别于 paper 的投稿稿)。 """brief 简报体例 docx 渲染器(商务红主题 + 引文上标超链 + callout/底纹边框)。
相对 paper/render_docx.py 的简报专属增强: brief 是三 profile 里最富的一支:书签锚点内部/外部超链接引文 [n]/[Wn] 上标回链
- **商务红配色**(主色 #C00000):标题分级染色 + 标题下细色条;TL;DR / 「判断」行做浅红底纹 callout 参考条目 DOI 超链概览信息带 / TL;DR 卡片 / 判断 callout页脚页码域这些 paper/proposal
- **引文上标 + 内部超链接**:正文 [1] / [W3] 上标红色,点击锚到重要论文列表 / 参考文献段对应条目 都没有, brief 保留自己的渲染层,只从 rendering.common 复用叶子原语(字体/化学式/块级正则/
- **论文列表 / 参考文献可点击**:标题含论文列表 / 文献列表 / 参考文献的段,行首 [n] 条目作锚点; 表格行切分/图片路径)函数体逐字移植自旧 skills/brief/scripts/render_docx.py
条目内 DOI(整条是 DOI 或末尾 "DOI: 10.xxx") https://doi.org/... 蓝色超链接;web 条目里的域名/路径 https:// 超链接
- **化学式下标(白名单)**:CO2 / C3S2 / Na2O / SO4 ... 真实下标,**白名单精确匹配**,不误伤 LC3 / EN 197-5 / 8.5 Mt / 2026
字体规范同院内其它渲染:中文宋体小四 / 英文 Times New Roman 小四 / 行距 1.5 / 首行缩进 2 字符
支持 **加粗** / *斜体* / `等宽` / 列表 / 表格 / ![caption](png) 居中插图
用法:
python render_docx.py <sections_dir> -o <out.docx>
python render_docx.py <sections_dir> --no-color -o <out.docx> # 关配色出纯黑白
""" """
from __future__ import annotations from __future__ import annotations
import argparse
import re import re
import sys import sys
from pathlib import Path from pathlib import Path
@ -27,6 +18,24 @@ from docx.oxml import OxmlElement
from docx.oxml.ns import qn from docx.oxml.ns import qn
from docx.shared import Cm, Pt, RGBColor from docx.shared import Cm, Pt, RGBColor
from .common import (
set_run_fonts as _set_run_fonts,
set_style_fonts as _set_style_fonts,
set_subscript as _set_subscript,
CHEM_RE as _CHEM_RE,
INLINE_RE as _INLINE_RE,
HEADING_RE as _HEADING_RE,
TABLE_LINE_RE as _TABLE_LINE_RE,
BLOCKQUOTE_RE as _BLOCKQUOTE_RE,
HR_RE as _HR_RE,
FENCE_RE as _FENCE_RE,
IMAGE_LINE_RE as _IMAGE_LINE_RE,
split_md_row as _split_md_row,
is_separator_row as _is_sep_row,
resolve_image_path as _resolve_image_path,
MAX_IMG_WIDTH as _MAX_IMG_WIDTH,
)
# ───────────────────────── 主题色 ───────────────────────── # ───────────────────────── 主题色 ─────────────────────────
PRIMARY = "C00000" # 商务红主色 PRIMARY = "C00000" # 商务红主色
@ -37,40 +46,7 @@ LINK_BLUE = "1155CC" # 超链接蓝
TABLE_HEAD_FILL = "C00000" TABLE_HEAD_FILL = "C00000"
# ───────────────────────── 字体 / 低层 OOXML 辅助 ───────────────────────── # ───────────────────────── 低层 OOXML 辅助 ─────────────────────────
def _set_run_fonts(run, *, cn_font="宋体", en_font="Times New Roman") -> None:
rPr = run._element.get_or_add_rPr()
rFonts = rPr.find(qn("w:rFonts"))
if rFonts is None:
rFonts = OxmlElement("w:rFonts")
rPr.append(rFonts)
rFonts.set(qn("w:eastAsia"), cn_font)
rFonts.set(qn("w:ascii"), en_font)
rFonts.set(qn("w:hAnsi"), en_font)
def _set_style_fonts(style, *, cn_font="宋体", en_font="Times New Roman") -> None:
el = style.element
rPr = el.find(qn("w:rPr"))
if rPr is None:
rPr = OxmlElement("w:rPr")
el.insert(0, rPr)
rFonts = rPr.find(qn("w:rFonts"))
if rFonts is None:
rFonts = OxmlElement("w:rFonts")
rPr.append(rFonts)
rFonts.set(qn("w:eastAsia"), cn_font)
rFonts.set(qn("w:ascii"), en_font)
rFonts.set(qn("w:hAnsi"), en_font)
def _set_subscript(run) -> None:
rPr = run._element.get_or_add_rPr()
va = OxmlElement("w:vertAlign")
va.set(qn("w:val"), "subscript")
rPr.append(va)
def _para_shading(paragraph, fill: str) -> None: def _para_shading(paragraph, fill: str) -> None:
pPr = paragraph._p.get_or_add_pPr() pPr = paragraph._p.get_or_add_pPr()
@ -191,26 +167,11 @@ def init_doc(color: bool) -> Document:
return doc return doc
# ───────────────────────── 内联:bold/italic/code 切分 ───────────────────────── # ───────────────────────── 内联:bold/italic/code + 引文 + 化学式 ─────────────────────────
_INLINE_RE = re.compile(
r"(?P<bold>\*\*(?P<bold_t>[^*\n]+?)\*\*)"
r"|(?P<italic>(?<![\*\w])\*(?P<italic_t>[^*\n]+?)\*(?!\*))"
r"|(?P<code>`(?P<code_t>[^`\n]+?)`)"
)
# 引文标记 [12] / [W3] # 引文标记 [12] / [W3]
_CITE_RE = re.compile(r"\[(W?\d+)\]") _CITE_RE = re.compile(r"\[(W?\d+)\]")
# 化学式下标白名单(统一三处渲染器共用同一份;长的在前,\b 防误伤 LC3 / C595 / 2026;
# 不含 Ca2+ 这类带电荷的——它是上标不是下标,白名单不收即天然避开)
_CHEM_RE = re.compile(
r"Ca\(OH\)2|Mg\(OH\)2"
r"|\b(?:Al2O3|Fe2O3|Fe3O4|Mn2O3|Cr2O3|P2O5|Na2SO4|K2SO4|CaSO4|CaCO3|MgCO3|"
r"CaCl2|MgCl2|Na2O|K2O|SiO2|TiO2|ZrO2|SO4|SO3|SO2|CO3|CO2|NO3|NO2|PO4|"
r"H2O|NH3|CH4|C4AF|C3S2|C2AS|C3S|C2S|C3A|O2|N2|H2)\b"
)
def _emit_chem(paragraph, text: str, *, size_pt: float, cn_font: str) -> None: def _emit_chem(paragraph, text: str, *, size_pt: float, cn_font: str) -> None:
"""把白名单化学式里的数字渲成下标,其余正常。""" """把白名单化学式里的数字渲成下标,其余正常。"""
@ -455,15 +416,7 @@ def add_reference_item(doc: Document, cid: str, value: str, bm_id: int, color: b
_emit_plain_run(p, value, size_pt=10.5, cn_font="宋体") _emit_plain_run(p, value, size_pt=10.5, cn_font="宋体")
# ───────────────────────── 行类型识别 ───────────────────────── # ───────────────────────── 行类型识别(brief 专属列表模式)─────────────────────────
_HEADING_RE = re.compile(r"^(#{1,6})\s+(.+)$")
_TABLE_LINE_RE = re.compile(r"^\s*\|.*\|\s*$")
_BLOCKQUOTE_RE = re.compile(r"^\s*>\s?")
_HR_RE = re.compile(r"^\s*-{3,}\s*$|^\s*={3,}\s*$|^\s*_{3,}\s*$")
_FENCE_RE = re.compile(r"^\s*(`{3,}|~{3,})\s*(\S*)\s*$")
_IMAGE_LINE_RE = re.compile(r"^\s*!\[(?P<cap>[^\]]*)\]\((?P<src>[^)\s]+)\)\s*$")
_MAX_IMG_WIDTH = Cm(15)
_LIST_PATTERNS = [ _LIST_PATTERNS = [
re.compile(r"^[-*+]\s"), re.compile(r"^[-*+]\s"),
@ -480,14 +433,6 @@ def is_list_item(line: str) -> bool:
# ───────────────────────── 表格 ───────────────────────── # ───────────────────────── 表格 ─────────────────────────
def _split_md_row(line: str) -> list[str]:
return [c.strip() for c in line.strip().strip("|").split("|")]
def _is_sep_row(cells: list[str]) -> bool:
return all(re.match(r"^[-:\s]+$", c) for c in cells if c != "")
def render_table(doc: Document, table_lines: list[str], color: bool) -> None: def render_table(doc: Document, table_lines: list[str], color: bool) -> None:
rows = [] rows = []
for ln in table_lines: for ln in table_lines:
@ -525,13 +470,6 @@ def render_table(doc: Document, table_lines: list[str], color: bool) -> None:
# ───────────────────────── 图片 ───────────────────────── # ───────────────────────── 图片 ─────────────────────────
def _resolve_image_path(src: str, base_dir: Path) -> Path | None:
p = Path(src)
if not p.is_absolute():
p = (base_dir / p).resolve()
return p if p.is_file() else None
def add_image(doc: Document, png_path: Path, caption: str | None, ctx: dict) -> None: def add_image(doc: Document, png_path: Path, caption: str | None, ctx: dict) -> None:
p = doc.add_paragraph() p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER p.alignment = WD_ALIGN_PARAGRAPH.CENTER
@ -683,6 +621,8 @@ def render_md_block(doc: Document, md_text: str, ctx: dict) -> None:
i = j i = j
# ───────────────────────── 入口 ─────────────────────────
def render_sections(sections_dir: Path, out: Path, color: bool) -> None: def render_sections(sections_dir: Path, out: Path, color: bool) -> None:
if not sections_dir.is_dir(): if not sections_dir.is_dir():
print(f"[ERR] sections dir not found: {sections_dir}", file=sys.stderr) print(f"[ERR] sections dir not found: {sections_dir}", file=sys.stderr)
@ -712,19 +652,5 @@ def render_sections(sections_dir: Path, out: Path, color: bool) -> None:
paras = sum(1 for _ in doc.paragraphs) paras = sum(1 for _ in doc.paragraphs)
chars = sum(len(p.text) for p in doc.paragraphs) chars = sum(len(p.text) for p in doc.paragraphs)
print(f"[OK] rendered {len(md_files)} sections -> {out}") print(f"[OK] rendered {len(md_files)} sections -> {out}")
print(f" paragraphs: {paras} | tables: {len(doc.tables)} | figures: {ctx['fig_no']} | chars: {chars}") print(f" profile: brief | paragraphs: {paras} | tables: {len(doc.tables)} | "
print(f" theme: {'商务红 #C00000' if color else '黑白'} | 引文上标+超链接 | 化学式下标白名单") f"figures: {ctx['fig_no']} | chars: {chars} | theme: {'商务红' if color else '黑白'}")
def main() -> None:
ap = argparse.ArgumentParser(description="渲染章节 md → 科研方向简报 docx")
ap.add_argument("sections_dir", type=Path, help="sections/*.md 目录")
ap.add_argument("--no-color", dest="color", action="store_false",
help="关配色,出纯黑白(默认商务红主题)")
ap.add_argument("-o", "--output", type=Path, required=True, help="输出 .docx 路径")
args = ap.parse_args()
render_sections(args.sections_dir, args.output, args.color)
if __name__ == "__main__":
main()

View File

@ -1,20 +1,14 @@
"""把 sections/*.md 渲染成期刊投稿稿 .docx (manuscript draft)。 """manuscript 体例 docx 渲染器(paper 投稿稿 + proposal 申报书,配置化双 profile)。
proposal/render_docx.py 同源, 差异: 两者原是近亲(~80% 逐字相同),差异收进 PROFILES:页边距 / TOC 标题 / 图题前缀 /
- fund-type; 改用 --lang {zh,en} (默认 en) 标注语言, 仅影响信息打印与首行缩进策略 列表多一条"第X条" / sections 循环(toc 是否默认 + 末段是否补分页)函数体移植自
- 目录 (TOC) 默认**不生成** (期刊投稿稿无需目录); 要草稿带目录加 --toc paper/proposal render_docx.py,叶子原语走 rendering.common
- 字体规范保持: 中文宋体小四 / 英文 Times New Roman 小四 / 行距 1.5 / 首行缩进 2 字符
(eastAsia=宋体 只对 CJK 字符生效, 纯英文论文正文走 Times New Roman, 同一套 style 通吃)
支持: **加粗** / *斜体* / `等宽`; 列表 / 表格 / ![caption](png) 居中插图 + 图题自增; profile=paper: --lang {zh,en}(图题前缀 /Fig.),--toc 可选(默认无)
```mermaid``` 块按 caption figures/fig_<caption>.png ( render_diagrams.py 预生成) profile=proposal: --fund-type ...(仅打印),始终带 TOC,每段后分页
用法:
python render_docx.py <sections_dir> --lang en -o <out.docx>
python render_docx.py <sections_dir> --lang zh --toc -o <out.docx>
""" """
from __future__ import annotations from __future__ import annotations
import argparse
import re import re
import sys import sys
from pathlib import Path from pathlib import Path
@ -25,38 +19,50 @@ from docx.oxml import OxmlElement
from docx.oxml.ns import qn from docx.oxml.ns import qn
from docx.shared import Cm, Pt, RGBColor from docx.shared import Cm, Pt, RGBColor
from . import common
# ───────────────────────── 字体辅助 ───────────────────────── from .common import set_run_fonts, set_style_fonts, set_subscript, CHEM_RE, parse_inline
def _set_run_fonts(run, *, cn_font: str = "宋体", en_font: str = "Times New Roman") -> None:
rPr = run._element.get_or_add_rPr()
rFonts = rPr.find(qn("w:rFonts"))
if rFonts is None:
rFonts = OxmlElement("w:rFonts")
rPr.append(rFonts)
rFonts.set(qn("w:eastAsia"), cn_font)
rFonts.set(qn("w:ascii"), en_font)
rFonts.set(qn("w:hAnsi"), en_font)
def _set_style_fonts(style, *, cn_font: str = "宋体", en_font: str = "Times New Roman") -> None: # ───────────────────────── profile 配置 ─────────────────────────
el = style.element
rPr = el.find(qn("w:rPr")) _BASE_LIST_PATTERNS = [
if rPr is None: re.compile(r"^\[\d+\]\s"), # [1]
rPr = OxmlElement("w:rPr") re.compile(r"^[-*+]\s"), # - / * / +
el.insert(0, rPr) re.compile(r"^\d+[\.、.]\s*"), # 1. / 1、 / 1
rFonts = rPr.find(qn("w:rFonts")) re.compile(r"^\(\d+\)\s*"), # (1)
if rFonts is None: re.compile(r"^\d+\s*"), # 1
rFonts = OxmlElement("w:rFonts") re.compile(r"^[一二三四五六七八九十百千]+[、.\.]"), # 一、
rPr.append(rFonts) re.compile(r"^[(][一二三四五六七八九十百千]+[)]"), # (一)
rFonts.set(qn("w:eastAsia"), cn_font) re.compile(r"^[①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮]"), # ①
rFonts.set(qn("w:ascii"), en_font) ]
rFonts.set(qn("w:hAnsi"), en_font)
PROFILES = {
"paper": {
"left_margin": Cm(2.5),
"right_margin": Cm(2.5),
"list_patterns": _BASE_LIST_PATTERNS,
"toc_title": "Contents",
"toc_placeholder": "[Press F9 in Word to generate the table of contents]",
"always_toc": False,
"trailing_page_break": False,
},
"proposal": {
"left_margin": Cm(3.0),
"right_margin": Cm(2.0),
"list_patterns": _BASE_LIST_PATTERNS + [
re.compile(r"^第[一二三四五六七八九十百]+[条章节]"), # 第一条
],
"toc_title": "目 录",
"toc_placeholder": "[在 Word 中按 F9 或右键此处选择 “更新域” 即可生成完整目录]",
"always_toc": True,
"trailing_page_break": True,
},
}
# ───────────────────────── 文档初始化 ───────────────────────── # ───────────────────────── 文档初始化 ─────────────────────────
def init_doc() -> Document: def init_doc(prof: dict) -> Document:
doc = Document() doc = Document()
section = doc.sections[0] section = doc.sections[0]
@ -64,13 +70,13 @@ def init_doc() -> Document:
section.page_width = Cm(21) section.page_width = Cm(21)
section.top_margin = Cm(2.5) section.top_margin = Cm(2.5)
section.bottom_margin = Cm(2.5) section.bottom_margin = Cm(2.5)
section.left_margin = Cm(2.5) section.left_margin = prof["left_margin"]
section.right_margin = Cm(2.5) section.right_margin = prof["right_margin"]
normal = doc.styles["Normal"] normal = doc.styles["Normal"]
normal.font.name = "Times New Roman" normal.font.name = "Times New Roman"
normal.font.size = Pt(12) normal.font.size = Pt(12)
_set_style_fonts(normal, cn_font="宋体") set_style_fonts(normal, cn_font="宋体")
pf = normal.paragraph_format pf = normal.paragraph_format
pf.line_spacing = 1.5 pf.line_spacing = 1.5
pf.space_before = Pt(0) pf.space_before = Pt(0)
@ -82,7 +88,7 @@ def init_doc() -> Document:
h.font.size = sz h.font.size = sz
h.font.bold = True h.font.bold = True
h.font.color.rgb = RGBColor(0, 0, 0) h.font.color.rgb = RGBColor(0, 0, 0)
_set_style_fonts(h, cn_font=cn) set_style_fonts(h, cn_font=cn)
h.paragraph_format.line_spacing = 1.5 h.paragraph_format.line_spacing = 1.5
h.paragraph_format.space_before = Pt(6) h.paragraph_format.space_before = Pt(6)
h.paragraph_format.space_after = Pt(3) h.paragraph_format.space_after = Pt(3)
@ -91,18 +97,16 @@ def init_doc() -> Document:
return doc return doc
# ───────────────────────── TOC (opt-in) ───────────────────────── def add_toc(doc: Document, prof: dict, depth: int = 3) -> None:
def add_toc(doc: Document, depth: int = 3) -> None:
p = doc.add_paragraph() p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER p.alignment = WD_ALIGN_PARAGRAPH.CENTER
p.paragraph_format.first_line_indent = None p.paragraph_format.first_line_indent = None
p.paragraph_format.space_before = Pt(12) p.paragraph_format.space_before = Pt(12)
p.paragraph_format.space_after = Pt(6) p.paragraph_format.space_after = Pt(6)
run = p.add_run("Contents") run = p.add_run(prof["toc_title"])
run.font.size = Pt(16) run.font.size = Pt(16)
run.font.bold = True run.font.bold = True
_set_run_fonts(run, cn_font="黑体") set_run_fonts(run, cn_font="黑体")
p = doc.add_paragraph() p = doc.add_paragraph()
p.paragraph_format.first_line_indent = None p.paragraph_format.first_line_indent = None
@ -119,7 +123,7 @@ def add_toc(doc: Document, depth: int = 3) -> None:
fldChar3.set(qn("w:fldCharType"), "end") fldChar3.set(qn("w:fldCharType"), "end")
placeholder_t = OxmlElement("w:t") placeholder_t = OxmlElement("w:t")
placeholder_t.set(qn("xml:space"), "preserve") placeholder_t.set(qn("xml:space"), "preserve")
placeholder_t.text = "[Press F9 in Word to generate the table of contents]" placeholder_t.text = prof["toc_placeholder"]
run._element.append(fldChar1) run._element.append(fldChar1)
run._element.append(instrText) run._element.append(instrText)
run._element.append(fldChar2) run._element.append(fldChar2)
@ -128,49 +132,7 @@ def add_toc(doc: Document, depth: int = 3) -> None:
doc.add_page_break() doc.add_page_break()
# ───────────────────────── 内联 markdown ───────────────────────── # ───────────────────────── 内联(化学式下标)─────────────────────────
_INLINE_RE = re.compile(
r"(?P<bold>\*\*(?P<bold_t>[^*\n]+?)\*\*)"
r"|(?P<italic>(?<![\*\w])\*(?P<italic_t>[^*\n]+?)\*(?!\*))"
r"|(?P<code>`(?P<code_t>[^`\n]+?)`)"
)
def parse_inline(text: str) -> list[tuple[str, str]]:
out: list[tuple[str, str]] = []
pos = 0
for m in _INLINE_RE.finditer(text):
if m.start() > pos:
out.append(("plain", text[pos:m.start()]))
if m.group("bold"):
out.append(("bold", m.group("bold_t")))
elif m.group("italic"):
out.append(("italic", m.group("italic_t")))
elif m.group("code"):
out.append(("code", m.group("code_t")))
pos = m.end()
if pos < len(text):
out.append(("plain", text[pos:]))
return out or [("plain", text)]
# ── 化学式下标白名单(与 proposal/brief 三处渲染器共用同一份)──
# 长的在前,\b 防误伤 LC3 / C595 / 2026;不收 Ca2+ 这类带电荷的(那是上标,白名单不收即天然避开)
_CHEM_RE = re.compile(
r"Ca\(OH\)2|Mg\(OH\)2"
r"|\b(?:Al2O3|Fe2O3|Fe3O4|Mn2O3|Cr2O3|P2O5|Na2SO4|K2SO4|CaSO4|CaCO3|MgCO3|"
r"CaCl2|MgCl2|Na2O|K2O|SiO2|TiO2|ZrO2|SO4|SO3|SO2|CO3|CO2|NO3|NO2|PO4|"
r"H2O|NH3|CH4|C4AF|C3S2|C2AS|C3S|C2S|C3A|O2|N2|H2)\b"
)
def _set_subscript(run) -> None:
rPr = run._element.get_or_add_rPr()
va = OxmlElement("w:vertAlign")
va.set(qn("w:val"), "subscript")
rPr.append(va)
def _emit_plain_with_chem(paragraph, text: str, *, size, cn_font: str) -> None: def _emit_plain_with_chem(paragraph, text: str, *, size, cn_font: str) -> None:
"""plain 段:白名单化学式里的数字渲成下标,其余正常。无命中即一条普通 run。""" """plain 段:白名单化学式里的数字渲成下标,其余正常。无命中即一条普通 run。"""
@ -179,12 +141,12 @@ def _emit_plain_with_chem(paragraph, text: str, *, size, cn_font: str) -> None:
return return
r = paragraph.add_run(seg) r = paragraph.add_run(seg)
r.font.size = size r.font.size = size
_set_run_fonts(r, cn_font=cn_font, en_font="Times New Roman") set_run_fonts(r, cn_font=cn_font, en_font="Times New Roman")
if sub: if sub:
_set_subscript(r) set_subscript(r)
pos = 0 pos = 0
for m in _CHEM_RE.finditer(text): for m in CHEM_RE.finditer(text):
_run(text[pos:m.start()]) _run(text[pos:m.start()])
buf = "" buf = ""
for ch in m.group(0): for ch in m.group(0):
@ -207,12 +169,12 @@ def add_inline(paragraph, text: str, *, size: Pt = Pt(12), cn_font: str = "宋
run.font.size = size run.font.size = size
if style == "bold": if style == "bold":
run.bold = True run.bold = True
_set_run_fonts(run, cn_font=cn_font, en_font="Times New Roman") set_run_fonts(run, cn_font=cn_font, en_font="Times New Roman")
elif style == "italic": elif style == "italic":
run.italic = True run.italic = True
_set_run_fonts(run, cn_font=cn_font, en_font="Times New Roman") set_run_fonts(run, cn_font=cn_font, en_font="Times New Roman")
elif style == "code": elif style == "code":
_set_run_fonts(run, cn_font=cn_font, en_font="Consolas") set_run_fonts(run, cn_font=cn_font, en_font="Consolas")
# ───────────────────────── 段落 / 标题 / 列表 ───────────────────────── # ───────────────────────── 段落 / 标题 / 列表 ─────────────────────────
@ -239,47 +201,9 @@ def add_body_paragraph(doc: Document, text: str, *, indent: bool = True) -> None
add_inline(p, text) add_inline(p, text)
# ───────────────────────── 行类型识别 ───────────────────────── def is_list_item(line: str, prof: dict) -> bool:
return any(p.match(line) for p in prof["list_patterns"])
_HEADING_RE = re.compile(r"^(#{1,6})\s+(.+)$")
_TABLE_LINE_RE = re.compile(r"^\s*\|.*\|\s*$")
_BLOCKQUOTE_RE = re.compile(r"^\s*>\s?")
_HR_RE = re.compile(r"^\s*-{3,}\s*$|^\s*={3,}\s*$|^\s*_{3,}\s*$")
_FENCE_RE = re.compile(r"^\s*(`{3,}|~{3,})\s*(\S*)\s*$")
_LIST_PATTERNS = [
re.compile(r"^\[\d+\]\s"),
re.compile(r"^[-*+]\s"),
re.compile(r"^\d+[\.、.]\s*"),
re.compile(r"^\(\d+\)\s*"),
re.compile(r"^\d+\s*"),
re.compile(r"^[一二三四五六七八九十百千]+[、.\.]"),
re.compile(r"^[(][一二三四五六七八九十百千]+[)]"),
re.compile(r"^[①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮]"),
]
def is_list_item(line: str) -> bool:
return any(p.match(line) for p in _LIST_PATTERNS)
def is_table_line(line: str) -> bool:
return bool(_TABLE_LINE_RE.match(line))
def is_heading(line: str) -> bool:
return bool(_HEADING_RE.match(line))
def is_blockquote(line: str) -> bool:
return bool(_BLOCKQUOTE_RE.match(line))
def is_hr(line: str) -> bool:
return bool(_HR_RE.match(line))
# ───────────────────────── 代码块 / ASCII 图 ─────────────────────────
def add_code_block(doc: Document, lines: list[str], lang: str = "") -> None: def add_code_block(doc: Document, lines: list[str], lang: str = "") -> None:
for ln in lines: for ln in lines:
@ -291,26 +215,18 @@ def add_code_block(doc: Document, lines: list[str], lang: str = "") -> None:
pf.space_after = Pt(0) pf.space_after = Pt(0)
run = p.add_run(ln if ln else " ") run = p.add_run(ln if ln else " ")
run.font.size = Pt(10.5) run.font.size = Pt(10.5)
_set_run_fonts(run, cn_font="新宋体", en_font="Consolas") set_run_fonts(run, cn_font="新宋体", en_font="Consolas")
for t in run._element.iter(qn("w:t")): for t in run._element.iter(qn("w:t")):
t.set(qn("xml:space"), "preserve") t.set(qn("xml:space"), "preserve")
# ───────────────────────── 表格 ───────────────────────── # ───────────────────────── 表格 ─────────────────────────
def _split_md_row(line: str) -> list[str]:
return [c.strip() for c in line.strip().strip("|").split("|")]
def _is_separator_row(cells: list[str]) -> bool:
return all(re.match(r"^[-:\s]+$", c) for c in cells if c != "")
def render_table(doc: Document, table_lines: list[str]) -> None: def render_table(doc: Document, table_lines: list[str]) -> None:
rows: list[list[str]] = [] rows: list[list[str]] = []
for ln in table_lines: for ln in table_lines:
cells = _split_md_row(ln) cells = common.split_md_row(ln)
if not cells or _is_separator_row(cells): if not cells or common.is_separator_row(cells):
continue continue
rows.append(cells) rows.append(cells)
if not rows: if not rows:
@ -341,10 +257,8 @@ def render_table(doc: Document, table_lines: list[str]) -> None:
# ───────────────────────── 图片 + 图题 ───────────────────────── # ───────────────────────── 图片 + 图题 ─────────────────────────
_IMAGE_LINE_RE = re.compile(r"^\s*!\[(?P<cap>[^\]]*)\]\((?P<src>[^)\s]+)\)\s*$")
_MERMAID_CAPTION_RE = re.compile(r"^\s*%%\s*caption\s*:\s*(.+?)\s*$", re.IGNORECASE) _MERMAID_CAPTION_RE = re.compile(r"^\s*%%\s*caption\s*:\s*(.+?)\s*$", re.IGNORECASE)
_FILENAME_INVALID_RE = re.compile(r"[^一-鿿A-Za-z0-9]+") _FILENAME_INVALID_RE = re.compile(r"[^一-鿿A-Za-z0-9]+")
_MAX_IMG_WIDTH = Cm(15)
def caption_to_stem(caption: str) -> str: def caption_to_stem(caption: str) -> str:
@ -362,13 +276,6 @@ def extract_mermaid_caption(source: str) -> str | None:
return None return None
def _resolve_image_path(src: str, base_dir: Path) -> Path | None:
p = Path(src)
if not p.is_absolute():
p = (base_dir / p).resolve()
return p if p.is_file() else None
def add_image(doc: Document, png_path: Path, caption: str | None, ctx: dict) -> None: def add_image(doc: Document, png_path: Path, caption: str | None, ctx: dict) -> None:
p = doc.add_paragraph() p = doc.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER p.alignment = WD_ALIGN_PARAGRAPH.CENTER
@ -377,7 +284,7 @@ def add_image(doc: Document, png_path: Path, caption: str | None, ctx: dict) ->
p.paragraph_format.space_after = Pt(3) p.paragraph_format.space_after = Pt(3)
run = p.add_run() run = p.add_run()
try: try:
run.add_picture(str(png_path), width=_MAX_IMG_WIDTH) run.add_picture(str(png_path), width=common.MAX_IMG_WIDTH)
except Exception as e: except Exception as e:
run.add_text(f"[image failed: {png_path.name}: {e}]") run.add_text(f"[image failed: {png_path.name}: {e}]")
return return
@ -393,12 +300,13 @@ def add_image(doc: Document, png_path: Path, caption: str | None, ctx: dict) ->
cap_run = cap_p.add_run(cap_text) cap_run = cap_p.add_run(cap_text)
cap_run.font.size = Pt(10.5) cap_run.font.size = Pt(10.5)
cap_run.bold = True cap_run.bold = True
_set_run_fonts(cap_run, cn_font="宋体", en_font="Times New Roman") set_run_fonts(cap_run, cn_font="宋体", en_font="Times New Roman")
# ───────────────────────── 主渲染 ───────────────────────── # ───────────────────────── 主渲染 ─────────────────────────
def render_md_block(doc: Document, md_text: str, ctx: dict) -> None: def render_md_block(doc: Document, md_text: str, ctx: dict) -> None:
prof = ctx["prof"]
lines = md_text.splitlines() lines = md_text.splitlines()
i = 0 i = 0
n = len(lines) n = len(lines)
@ -409,15 +317,15 @@ def render_md_block(doc: Document, md_text: str, ctx: dict) -> None:
i += 1 i += 1
continue continue
if is_hr(line): if common.is_hr(line):
i += 1 i += 1
continue continue
m_img = _IMAGE_LINE_RE.match(line) m_img = common.IMAGE_LINE_RE.match(line)
if m_img: if m_img:
src = m_img.group("src") src = m_img.group("src")
cap = m_img.group("cap").strip() or None cap = m_img.group("cap").strip() or None
png = _resolve_image_path(src, ctx["sections_dir"]) png = common.resolve_image_path(src, ctx["sections_dir"])
if png is not None: if png is not None:
add_image(doc, png, cap, ctx) add_image(doc, png, cap, ctx)
else: else:
@ -425,14 +333,14 @@ def render_md_block(doc: Document, md_text: str, ctx: dict) -> None:
i += 1 i += 1
continue continue
m_fence = _FENCE_RE.match(line) m_fence = common.FENCE_RE.match(line)
if m_fence: if m_fence:
fence = m_fence.group(1) fence = m_fence.group(1)
lang = m_fence.group(2) or "" lang = m_fence.group(2) or ""
code: list[str] = [] code: list[str] = []
i += 1 i += 1
while i < n: while i < n:
m_close = _FENCE_RE.match(lines[i]) m_close = common.FENCE_RE.match(lines[i])
if m_close and m_close.group(1)[0] == fence[0] and len(m_close.group(1)) >= len(fence): if m_close and m_close.group(1)[0] == fence[0] and len(m_close.group(1)) >= len(fence):
i += 1 i += 1
break break
@ -452,26 +360,26 @@ def render_md_block(doc: Document, md_text: str, ctx: dict) -> None:
add_code_block(doc, code, lang) add_code_block(doc, code, lang)
continue continue
if is_table_line(line): if common.is_table_line(line):
block: list[str] = [] block: list[str] = []
while i < n and is_table_line(lines[i]): while i < n and common.is_table_line(lines[i]):
block.append(lines[i]) block.append(lines[i])
i += 1 i += 1
render_table(doc, block) render_table(doc, block)
continue continue
m = _HEADING_RE.match(line) m = common.HEADING_RE.match(line)
if m: if m:
level = min(len(m.group(1)), 3) level = min(len(m.group(1)), 3)
add_heading(doc, m.group(2).strip(), level) add_heading(doc, m.group(2).strip(), level)
i += 1 i += 1
continue continue
if is_blockquote(line): if common.is_blockquote(line):
i += 1 i += 1
continue continue
if is_list_item(line): if is_list_item(line, prof):
add_body_paragraph(doc, line.strip(), indent=False) add_body_paragraph(doc, line.strip(), indent=False)
i += 1 i += 1
continue continue
@ -482,7 +390,8 @@ def render_md_block(doc: Document, md_text: str, ctx: dict) -> None:
nxt = lines[j].rstrip() nxt = lines[j].rstrip()
if not nxt.strip(): if not nxt.strip():
break break
if is_heading(nxt) or is_blockquote(nxt) or is_table_line(nxt) or is_list_item(nxt) or is_hr(nxt): if (common.is_heading(nxt) or common.is_blockquote(nxt) or common.is_table_line(nxt)
or is_list_item(nxt, prof) or common.is_hr(nxt)):
break break
buf.append(nxt.strip()) buf.append(nxt.strip())
j += 1 j += 1
@ -492,7 +401,9 @@ def render_md_block(doc: Document, md_text: str, ctx: dict) -> None:
# ───────────────────────── 入口 ───────────────────────── # ───────────────────────── 入口 ─────────────────────────
def render_sections(sections_dir: Path, out: Path, lang: str, toc: bool) -> None: def render_sections(profile: str, sections_dir: Path, out: Path, *,
lang: str = "en", toc: bool = False, fund_type: str = "") -> None:
prof = PROFILES[profile]
if not sections_dir.is_dir(): if not sections_dir.is_dir():
print(f"[ERR] sections dir not found: {sections_dir}", file=sys.stderr) print(f"[ERR] sections dir not found: {sections_dir}", file=sys.stderr)
sys.exit(2) sys.exit(2)
@ -503,19 +414,20 @@ def render_sections(sections_dir: Path, out: Path, lang: str, toc: bool) -> None
figures_dir = sections_dir.parent / "figures" figures_dir = sections_dir.parent / "figures"
ctx: dict = { ctx: dict = {
"prof": prof,
"sections_dir": sections_dir, "sections_dir": sections_dir,
"figures_dir": figures_dir, "figures_dir": figures_dir,
"fig_no": 0, "fig_no": 0,
"fig_label": "" if lang == "zh" else "Fig.", "fig_label": ("" if lang == "zh" else "Fig.") if profile == "paper" else "",
} }
doc = init_doc() doc = init_doc(prof)
if toc: if prof["always_toc"] or toc:
add_toc(doc) add_toc(doc, prof)
for idx, f in enumerate(md_files): for idx, f in enumerate(md_files):
text = f.read_text(encoding="utf-8") text = f.read_text(encoding="utf-8")
render_md_block(doc, text, ctx) render_md_block(doc, text, ctx)
if idx != len(md_files) - 1: if prof["trailing_page_break"] or idx != len(md_files) - 1:
doc.add_page_break() doc.add_page_break()
out.parent.mkdir(parents=True, exist_ok=True) out.parent.mkdir(parents=True, exist_ok=True)
@ -525,22 +437,5 @@ def render_sections(sections_dir: Path, out: Path, lang: str, toc: bool) -> None
chars = sum(len(p.text) for p in doc.paragraphs) chars = sum(len(p.text) for p in doc.paragraphs)
tbls = len(doc.tables) tbls = len(doc.tables)
print(f"[OK] rendered {len(md_files)} sections -> {out}") print(f"[OK] rendered {len(md_files)} sections -> {out}")
print(f" paragraphs: {paras} | tables: {tbls} | figures: {ctx['fig_no']} | total chars: {chars}") print(f" profile: {profile} | paragraphs: {paras} | tables: {tbls} | "
print(f" lang: {lang} | toc: {toc}") f"figures: {ctx['fig_no']} | chars: {chars}")
print(f" font: 中文宋体小四 / 英文 Times New Roman 小四 / 行距 1.5 / 首行缩进 2 字符")
def main() -> None:
ap = argparse.ArgumentParser(description="渲染章节 md → 论文投稿稿 docx")
ap.add_argument("sections_dir", type=Path, help="sections/*.md 目录")
ap.add_argument("--lang", choices=["zh", "en"], default="en",
help="论文语言 (影响图题前缀 图/Fig. 与信息打印); 默认 en")
ap.add_argument("--toc", action="store_true",
help="生成目录页 (期刊投稿稿通常不需要; 内部草稿评阅时可加)")
ap.add_argument("-o", "--output", type=Path, required=True, help="输出 .docx 路径")
args = ap.parse_args()
render_sections(args.sections_dir, args.output, args.lang, args.toc)
if __name__ == "__main__":
main()

177
rendering/pdf.py Normal file
View File

@ -0,0 +1,177 @@
"""md(sections 目录或单 .md)→ PDF,沙盒自带 chromium 渲染。
渲染链(全程沙盒内,不进 weasyprint不装额外包):
md --(python `markdown` )--> HTML --(chromium --headless --print-to-pdf)--> PDF
chromium 是镜像里已装的( mermaid ),fonts-noto-cjk 也已装;chromium 是完整浏览器
内核,CSS 保真度比 weasyprint 冒烟见 deploy/sandbox/probe_chromium_pdf.sh
视觉与 docx 一致:复用 common.CHEM_RE(化学式下标白名单,单一事实源)+ 商务红配色 +
DOI/URL 超链引文 [n] 上标回链这版按字面渲染(后续与 docx 一起 DRY 再补)
ASCII-only stdout(Windows GBK 控制台安全)
"""
from __future__ import annotations
import os
import re
import shutil
import subprocess
import tempfile
from pathlib import Path
from .common import CHEM_RE
# ───────────────────────── 主题色(与 docx 商务红一致)─────────────────────────
PRIMARY = "#C00000"
TLDR_FILL = "#FBE9E9"
LINK_BLUE = "#1155CC"
TABLE_HEAD_FILL = "#C00000"
TABLE_ZEBRA = "#F8F0F0"
# 行内 DOI 子串(HTML-safe 边界)
_DOI_INLINE_RE = re.compile(r"10\.\d{4,9}/[^\s<>\"]+")
# 裸 URL / 域名 token
_URL_TOKEN_RE = re.compile(
r"(?<![\w/@.])((?:https?://)?[a-z0-9][\w.\-]*\.[a-z]{2,}(?:/[^\s<>\"]*)?)",
re.IGNORECASE,
)
# 切分 HTML 成 [文本, 标签, ...];只对文本 token 做下标/超链替换
_TAG_SPLIT = re.compile(r"(<[^>]+>)")
_SKIP_TAGS = {"a", "code", "pre", "script", "style", "head"}
_TAG_NAME_RE = re.compile(r"<\s*(/?)\s*([a-zA-Z0-9]+)")
def _log(msg: str) -> None:
print(f"[render_pdf] {msg}")
def _emit_chem(text: str) -> str:
def repl(m: re.Match) -> str:
return re.sub(r"(\d+)", r"<sub>\1</sub>", m.group(0))
return CHEM_RE.sub(repl, text)
def _emit_links(text: str) -> str:
def doi_repl(m: re.Match) -> str:
doi = m.group(0)
return f'<a href="https://doi.org/{doi}">{doi}</a>'
text = _DOI_INLINE_RE.sub(doi_repl, text)
out_parts = []
for piece in _TAG_SPLIT.split(text):
if piece.startswith("<"):
out_parts.append(piece)
continue
def url_repl(m: re.Match) -> str:
raw = m.group(1)
href = raw if raw.lower().startswith("http") else f"https://{raw}"
return f'<a href="{href}">{raw}</a>'
out_parts.append(_URL_TOKEN_RE.sub(url_repl, piece))
return "".join(out_parts)
def _enrich_html(html: str) -> str:
"""对 HTML 纯文本片段做化学式下标 + DOI/URL 超链;<a>/<code>/<pre> 内不动。"""
out = []
skip_depth = 0
for token in _TAG_SPLIT.split(html):
if not token:
continue
if token.startswith("<"):
m = _TAG_NAME_RE.match(token)
if m:
closing, name = m.group(1), m.group(2).lower()
if name in _SKIP_TAGS and not token.rstrip().endswith("/>"):
skip_depth += -1 if closing else 1
skip_depth = max(0, skip_depth)
out.append(token)
else:
out.append(token if skip_depth else _emit_links(_emit_chem(token)))
return "".join(out)
def _read_sections(src: Path) -> str:
if src.is_dir():
parts = [md.read_text(encoding="utf-8") for md in sorted(src.glob("*.md"))]
if not parts:
raise SystemExit(f"[render_pdf] no *.md under {src}")
return "\n\n".join(parts)
return src.read_text(encoding="utf-8")
def _css(color: bool) -> str:
primary = PRIMARY if color else "#000000"
head_fill = TABLE_HEAD_FILL if color else "#000000"
zebra = TABLE_ZEBRA if color else "#FFFFFF"
tldr = TLDR_FILL if color else "#FFFFFF"
link = LINK_BLUE if color else "#000000"
return f"""
@page {{ size: A4; margin: 2.2cm 2cm; }}
* {{ -webkit-print-color-adjust: exact; print-color-adjust: exact; }}
body {{ font-family: 'Times New Roman','Noto Serif CJK SC','Noto Sans CJK SC',serif;
font-size: 12pt; line-height: 1.6; color: #000; }}
h1 {{ font-family: 'Noto Sans CJK SC',sans-serif; font-size: 19pt; color: {primary};
border-bottom: 2px solid {primary}; padding-bottom: 4pt; margin: 22pt 0 12pt; }}
h2 {{ font-family: 'Noto Sans CJK SC',sans-serif; font-size: 15pt; color: {primary}; margin: 20pt 0 8pt; }}
h3 {{ font-family: 'Noto Sans CJK SC',sans-serif; font-size: 13pt; color: {primary}; margin: 16pt 0 6pt; }}
p {{ text-align: justify; margin: 6pt 0; }}
a {{ color: {link}; text-decoration: underline; word-break: break-all; }}
sub {{ font-size: 0.72em; }}
table {{ border-collapse: collapse; width: 100%; margin: 12pt 0; font-size: 10.5pt; }}
th {{ background: {head_fill}; color: #fff; padding: 6pt 8pt; border: 1px solid #999; text-align: center; }}
td {{ padding: 5pt 8pt; border: 1px solid #999; }}
tr:nth-child(even) td {{ background: {zebra}; }}
blockquote {{ border-left: 4px solid {primary}; background: {tldr}; margin: 12pt 0;
padding: 8pt 12pt; font-size: 11pt; }}
blockquote p {{ margin: 3pt 0; }}
code {{ font-family: Consolas,monospace; font-size: 10pt; background: #f5f5f5; padding: 1pt 3pt; }}
ul,ol {{ margin: 6pt 0; padding-left: 22pt; }}
li {{ margin: 3pt 0; }}
"""
def _find_chromium() -> str:
env = os.environ.get("CHROMIUM") or os.environ.get("CHROME")
cands = [env] if env else []
cands += ["chromium", "chromium-browser", "google-chrome",
"/usr/bin/chromium", "/usr/bin/chromium-browser"]
for c in cands:
if c and (shutil.which(c) or Path(c).exists()):
return shutil.which(c) or c
raise SystemExit("[render_pdf] chromium 不在沙盒里(镜像应已装,给 mermaid 用)。"
"确认 `which chromium` 或设 CHROMIUM 环境变量。")
def md_to_pdf(src: Path, out: Path, *, color: bool = True, profile: str = "") -> Path:
try:
import markdown
except ImportError:
raise SystemExit("[render_pdf] 缺 `markdown` 包。基础镜像应已装(requirements.txt);"
"本地补:.venv/Scripts/python.exe -m pip install markdown")
md_text = _read_sections(src)
body = markdown.markdown(
md_text, extensions=["tables", "fenced_code", "sane_lists", "attr_list"]
)
body = _enrich_html(body)
html = (f'<!DOCTYPE html><html lang="zh-CN"><head><meta charset="utf-8">'
f"<style>{_css(color)}</style></head><body>{body}</body></html>")
chromium = _find_chromium()
out.parent.mkdir(parents=True, exist_ok=True)
with tempfile.TemporaryDirectory(prefix="render-pdf-") as tmp:
html_path = Path(tmp) / "doc.html"
html_path.write_text(html, encoding="utf-8")
cmd = [
chromium, "--headless", "--disable-gpu", "--no-sandbox",
"--disable-dev-shm-usage", f"--user-data-dir={tmp}/cr",
"--no-pdf-header-footer",
f"--print-to-pdf={out}", html_path.as_uri(),
]
proc = subprocess.run(cmd, capture_output=True, timeout=120, check=False)
if proc.returncode != 0 or not out.exists() or out.stat().st_size == 0:
tail = (proc.stderr or proc.stdout or b"").decode("utf-8", "replace")[-600:]
raise SystemExit(f"[render_pdf] chromium 转 PDF 失败(rc={proc.returncode}):\n{tail}")
return out

63
rendering/render.py Normal file
View File

@ -0,0 +1,63 @@
"""平台渲染统一入口。各 skill 出 docx/pdf 都调这一个,不再自带 render 脚本。
用法(沙盒内 / host ):
python /sandbox/rendering/render.py --profile brief --format docx <sections> -o out.docx
python /sandbox/rendering/render.py --profile brief --format pdf <sections> -o out.pdf
python /sandbox/rendering/render.py --profile paper --format docx <sections> --lang zh -o out.docx
python /sandbox/rendering/render.py --profile proposal --format docx <sections> --fund-type key_rd -o out.docx
--no-color 出黑白(brief docx / 任意 pdf 生效)<sections> 可为目录(拼接其 *.md)或单个 .md
"""
from __future__ import annotations
import argparse
import os
import sys
from pathlib import Path
# bootstrap:让 `import rendering.*` 在 `python /sandbox/rendering/render.py` 直接调时也能解析。
# render.py 恒在 <root>/rendering/render.py,故 dirname(dirname(__file__)) 恒为含 rendering/ 的根
# (沙盒=/sandbox,host=repo 根),与挂载点 / 深度无关。
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from rendering import docx_brief, docx_manuscript, pdf # noqa: E402
def main(argv: list[str] | None = None) -> int:
ap = argparse.ArgumentParser(description="md(sections 目录或单 .md)→ docx / pdf")
ap.add_argument("src", type=Path, help="sections 目录(拼接其 *.md)或单个 .md")
ap.add_argument("--profile", required=True, choices=["brief", "paper", "proposal"])
ap.add_argument("--format", default="docx", choices=["docx", "pdf"])
ap.add_argument("-o", "--output", type=Path, required=True, help="输出路径")
ap.add_argument("--no-color", dest="color", action="store_false",
help="关配色出黑白(brief docx / pdf 生效)")
ap.add_argument("--lang", choices=["zh", "en"], default="en",
help="paper 图题前缀 图/Fig.;默认 en")
ap.add_argument("--toc", action="store_true", help="paper 生成目录页(proposal 始终带)")
ap.add_argument("--fund-type", default="key_rd",
help="proposal 基金类型(仅打印标注)")
args = ap.parse_args(argv)
if not args.src.exists():
print(f"[render] 输入不存在:{args.src}", file=sys.stderr)
return 1
if args.format == "pdf":
out = pdf.md_to_pdf(args.src, args.output, color=args.color, profile=args.profile)
print(f"[render] OK pdf -> {out} ({out.stat().st_size} bytes)")
return 0
# docx
if args.profile == "brief":
docx_brief.render_sections(args.src, args.output, args.color)
elif args.profile == "paper":
docx_manuscript.render_sections("paper", args.src, args.output,
lang=args.lang, toc=args.toc)
else: # proposal
docx_manuscript.render_sections("proposal", args.src, args.output,
fund_type=args.fund_type)
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -7,6 +7,11 @@ rich>=13.7.0
python-pptx>=0.6.21 python-pptx>=0.6.21
python-docx>=1.1.0 python-docx>=1.1.0
matplotlib>=3.8.0 matplotlib>=3.8.0
Pillow>=9.0.0 # ppt skill(SVG-first)svg_finalize:配图裁切/内嵌
# ppt skill 可选 —— 老版 Office(<2019)的 SVG→PNG 兜底;现代 PowerPoint 直接渲 SVG 无需,核心不依赖:
# svglib>=1.5.0
# reportlab>=4.0.0
markdown>=3.5 # skills/_shared/render_pdf.py: md→HTML→chromium 出 PDF(纯 Python,host/sandbox 通吃)
# 素材摄取: PDF/DOCX/PPTX/XLSX/HTML/URL → Markdown (ppt 阶段零 + proposal 阶段零) # 素材摄取: PDF/DOCX/PPTX/XLSX/HTML/URL → Markdown (ppt 阶段零 + proposal 阶段零)
markitdown[pdf,docx,pptx,xlsx]>=0.0.1 markitdown[pdf,docx,pptx,xlsx]>=0.0.1
@ -18,6 +23,10 @@ html2text>=2024.0
# 定时任务(§8.5 scheduled_jobs):cron 串 → next_run_at 计算,正确处理 dom/dow OR 语义 + 时区 # 定时任务(§8.5 scheduled_jobs):cron 串 → next_run_at 计算,正确处理 dom/dow OR 语义 + 时区
croniter>=2.0 croniter>=2.0
# 微信接入(§8.7 ClawBot):segno 渲绑定二维码;cryptography 做凭据列加密 + 文件 AES-128-ECB
segno>=1.6
cryptography>=42.0
# §7 B 阶段: Storage 落 PG # §7 B 阶段: Storage 落 PG
sqlalchemy>=2.0.0 sqlalchemy>=2.0.0
psycopg[binary]>=3.1.0 psycopg[binary]>=3.1.0

View File

@ -0,0 +1,56 @@
"""Dump the task_progress tool-call sequence for a task (by id prefix). ASCII-only."""
import json
import os
import sys
from pathlib import Path
env = Path(__file__).resolve().parent.parent / ".env"
for line in env.read_text(encoding="utf-8").splitlines():
if line.strip().startswith("ZCBOT_DB_URL="):
os.environ["ZCBOT_DB_URL"] = line.split("=", 1)[1].strip()
from sqlalchemy import create_engine, text # noqa: E402
engine = create_engine(os.environ["ZCBOT_DB_URL"])
prefix = sys.argv[1] if len(sys.argv) > 1 else "d1285247"
with engine.connect() as conn:
row = conn.execute(
text("select task_id,name,status,run_status from tasks where task_id::text like :p"),
{"p": prefix + "%"},
).fetchone()
if not row:
print("[NO TASK]", prefix)
sys.exit(1)
tid = row[0]
print(f"[TASK] {tid} name={row[1]!r} status={row[2]} run={row[3]}")
msgs = conn.execute(
text("select idx,payload from messages where task_id=:t order by idx"),
{"t": tid},
).fetchall()
print(f"[MESSAGES] {len(msgs)}")
n = 0
for idx, p in msgs:
for tc in p.get("tool_calls") or []:
fn = tc.get("function") or {}
if fn.get("name") != "task_progress":
continue
n += 1
try:
args = json.loads(fn.get("arguments") or "{}")
except Exception as e:
print(f" [{idx}] PARSE-ERR: {e} raw={fn.get('arguments')!r}")
continue
act = args.get("action")
if act == "set_plan":
steps = args.get("steps") or []
print(f" [{idx}] set_plan ({len(steps)} steps):")
for st in steps:
print(f" {st.get('id')!r:8} {st.get('status'):11} {st.get('title')!r}")
elif act == "update_step":
st = args.get("step") or {}
print(f" [{idx}] update_step id={st.get('id')!r} status={st.get('status')!r} title={st.get('title')!r}")
else:
print(f" [{idx}] {act} {json.dumps(args, ensure_ascii=False)}")
print(f"[task_progress calls] {n}")

View File

@ -0,0 +1,93 @@
"""diag: 查 scheduled-e621c8a6 这个 job 为何执行到一半没推送(ASCII only, GBK safe)."""
import os
import sys
from pathlib import Path
env = Path(__file__).resolve().parent.parent / ".env"
for line in env.read_text(encoding="utf-8").splitlines():
if line.strip().startswith("ZCBOT_DB_URL="):
os.environ["ZCBOT_DB_URL"] = line.split("=", 1)[1].strip()
from sqlalchemy import create_engine, text # noqa: E402
import builtins # noqa: E402
_out = open(Path(__file__).resolve().parent / "_sched_e621.txt", "w", encoding="utf-8")
def print(*a, **k): # noqa: A001
builtins.print(*a, **k, file=_out)
PREFIX = sys.argv[1] if len(sys.argv) > 1 else "e621c8a6"
engine = create_engine(os.environ["ZCBOT_DB_URL"])
def s(x, n=2000):
t = str(x if x is not None else "")
return t if len(t) <= n else t[:n] + f"...[+{len(t)-n}]"
with engine.connect() as conn:
job = conn.execute(text(
"select job_id,user_id,name,mode,cron,tz,enabled,notify,timeout_seconds,"
"next_run_at,last_run_at,last_status,last_error,last_task_id,"
"consecutive_failures,run_count,bound_task_id,created_at,deleted_at "
"from scheduled_jobs where cast(job_id as text) like :p"),
{"p": PREFIX + "%"}).fetchall()
print(f"[JOBS matched '{PREFIX}'] {len(job)}")
for j in job:
print("-" * 60)
print(f"job_id={j[0]} name={j[2]!r}")
print(f" mode={j[3]} cron={j[4]!r} tz={j[5]} enabled={j[6]} timeout={j[8]}")
print(f" notify={j[7]}")
print(f" next_run_at={j[9]} last_run_at={j[10]}")
print(f" last_status={j[11]} consecutive_failures={j[14]} run_count={j[15]}")
print(f" last_task_id={j[13]} bound_task_id={j[16]}")
print(f" deleted_at={j[18]} created_at={j[17]}")
if j[12]:
print(f" last_error: {s(j[12], 1500)}")
if not job:
sys.exit(0)
j = job[0]
uid = j[1]
last_tid = j[13]
# 找该 job 关联的所有 task(scheduled_job_id 回填 + last_task_id)
tasks = conn.execute(text(
"select task_id,name,status,run_status,run_error,tokens_prompt,tokens_completion,"
"created_at,updated_at,scheduled_job_id from tasks "
"where scheduled_job_id = :jid order by created_at"),
{"jid": str(j[0])}).fetchall()
print("\n" + "=" * 60)
print(f"[TASKS with scheduled_job_id={str(j[0])[:8]}] {len(tasks)}")
for t in tasks:
print(f" task={t[0]} name={t[1]!r} status={t[2]} run={t[3]} "
f"tok={t[5]}/{t[6]} created={t[7]} updated={t[8]}")
if t[4]:
print(f" run_error: {s(t[4], 1500)}")
# dump last_task_id 的消息(执行到哪一步)
tid = last_tid or (tasks[-1][0] if tasks else None)
if tid is None:
print("\n[no task to dump]")
sys.exit(0)
print("\n" + "=" * 60)
print(f"[DUMP messages of task {tid}]")
msgs = conn.execute(text(
"select idx,payload,tokens_in,tokens_out,created_at from messages "
"where task_id=:t order by idx"), {"t": str(tid)}).fetchall()
print(f"messages: {len(msgs)}\n")
for idx, p, ti, to, cat in msgs:
role = p.get("role")
head = f"[{idx}] {role} tok={ti}/{to} at={cat}"
print(head)
content = p.get("content")
if content:
print(" content:", s(content, 1500))
for tc in p.get("tool_calls") or []:
fn = tc.get("function") or {}
print(f" CALL {fn.get('name')}({s(fn.get('arguments'), 800)})")
if role == "tool":
print(f" TOOL[{p.get('name')}]:", s(content, 1200))
print()

View File

@ -0,0 +1,87 @@
"""诊断微信对话里 wechat_push 发文件失败:dump 绑定状态 + 微信 task 里 wechat_push 工具调用与返回。
ASCII 标签(Windows GBK 安全)用法:.venv/Scripts/python.exe scripts/diag_wechat_push.py [email]
"""
import json
import os
import sys
from datetime import datetime, timezone
from pathlib import Path
env = Path(__file__).resolve().parent.parent / ".env"
for line in env.read_text(encoding="utf-8").splitlines():
if line.strip().startswith("ZCBOT_DB_URL="):
os.environ["ZCBOT_DB_URL"] = line.split("=", 1)[1].strip()
from sqlalchemy import create_engine, text # noqa: E402
import builtins # noqa: E402
_out = open(Path(__file__).resolve().parent / "_wechat_push_dump.txt", "w", encoding="utf-8")
def print(*a, **k): # noqa: A001
builtins.print(*a, **k, file=_out)
engine = create_engine(os.environ["ZCBOT_DB_URL"])
email = sys.argv[1] if len(sys.argv) > 1 else "caoqianming@foxmail.com"
def s(x, n=2000):
t = str(x or "")
return t if len(t) <= n else t[:n] + f"...[+{len(t)-n}]"
with engine.connect() as conn:
row = conn.execute(text("select user_id from users where email=:e"), {"e": email}).fetchone()
if not row:
print("[NO USER]", email); sys.exit(1)
uid = row[0]
print("[USER]", uid)
b = conn.execute(text(
"select user_im_id, base_url, status, context_token_at, "
"(latest_context_token is not null) as has_ctx, chat_task_id "
"from wechat_bot_bindings where user_id=:u"), {"u": uid}).fetchone()
if not b:
print("[NO BINDING]"); sys.exit(1)
print("[BINDING] status=%s user_im_id=%s has_ctx=%s ctx_at=%s base=%s" % (
b.status, b.user_im_id, b.has_ctx, b.context_token_at, b.base_url))
print("[BINDING] chat_task_id=%s" % b.chat_task_id)
if b.context_token_at:
at = b.context_token_at
if at.tzinfo is None:
at = at.replace(tzinfo=timezone.utc)
age = datetime.now(timezone.utc) - at
print("[BINDING] ctx age = %s (fresh if <24h)" % age)
tid = b.chat_task_id
if not tid:
print("[NO CHAT TASK]"); sys.exit(0)
# dump messages, focus on wechat_push tool calls/results
rows = conn.execute(text(
"select idx, payload from messages where task_id=:t order by idx desc limit 60"),
{"t": tid}).fetchall()
print("\n[MESSAGES] last %d (newest first):" % len(rows))
for idx, payload in rows:
if isinstance(payload, str):
try:
payload = json.loads(payload)
except Exception:
pass
if not isinstance(payload, dict):
continue
role = payload.get("role")
# assistant tool_calls
tcs = payload.get("tool_calls") or []
for tc in tcs:
fn = (tc.get("function") or {})
if fn.get("name") == "wechat_push":
print(" #%s [CALL wechat_push] args=%s" % (idx, s(fn.get("arguments"), 800)))
# tool result
if role == "tool":
name = payload.get("name", "")
content = payload.get("content")
if name == "wechat_push" or "微信" in s(content, 200) or "wechat" in s(name):
print(" #%s [TOOL RESULT %s] %s" % (idx, name, s(content, 800)))

81
scripts/diag_wecom.py Normal file
View File

@ -0,0 +1,81 @@
"""企业微信推送诊断:分步查 gettoken / message_send 的确切 errcode/errmsg。
用法(服务器上,.env 同目录):
.venv/Scripts/python.exe scripts/diag_wecom.py <userid>
.env WECOM_CORPID/AGENTID/SECRETASCII 输出,secret 不打印
常见 errcode:
gettoken: 40013=corpid / 40001|42001=secret / 41002= corpid
send: 60011=无权限(应用可见范围没包含该成员)/ 81013=UserID 不存在
40056=agentid / 60020=IP 不在可信IP / 81014=该成员未关注/未激活
"""
import os
import sys
# 仓库根加入 sys.path(脚本在 scripts/ 下,直跑时 core 在上一级)
_ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
if _ROOT not in sys.path:
sys.path.insert(0, _ROOT)
def _load_env(path: str) -> None:
"""加载 .env:优先 python-dotenv,没装则手动解析(只填未设置的 key)。"""
try:
from dotenv import load_dotenv
load_dotenv(path)
return
except Exception:
pass
try:
with open(path, encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line or line.startswith("#") or "=" not in line:
continue
k, v = line.split("=", 1)
os.environ.setdefault(k.strip(), v.strip().strip('"').strip("'"))
except FileNotFoundError:
pass
_load_env(os.path.join(_ROOT, ".env"))
from core.wechat import wecom
def main() -> int:
uid = sys.argv[1] if len(sys.argv) > 1 else None
print("[cfg] configured:", wecom.wecom_configured())
print("[cfg] corpid:", (os.getenv("WECOM_CORPID", "") or "")[:8] + "...",
"| agentid:", os.getenv("WECOM_AGENTID", ""))
if not wecom.wecom_configured():
print("[FAIL] WECOM_CORPID/AGENTID/SECRET 没读到(确认 .env 在当前目录、值已填)")
return 1
print("[step1] gettoken ...")
try:
tok = wecom.get_access_token(force=True)
print(f"[step1] OK (token len {len(tok)})")
except Exception as e:
print(f"[step1] FAIL: {e}")
print(" → corpid 或 secret 不对(secret 必须是这个自建应用的,不是通讯录密钥)")
return 2
if not uid:
print("[step2] 跳过(没给 userid 参数);用法: diag_wecom.py <userid>")
return 0
print(f"[step2] message/send 到 userid={uid} ...")
try:
wecom.send_text(uid, "zcbot 企业微信诊断测试消息")
print(f"[step2] OK → 去企业微信查收。链路通了!")
except Exception as e:
print(f"[step2] FAIL: {e}")
print(" → 看 errcode:60011=应用可见范围没含该成员 / 81013=userid 写错"
"(大小写要和通讯录「账号」完全一致)/ 40056=agentid 错")
return 3
return 0
if __name__ == "__main__":
sys.exit(main())

150
scripts/probe_clawbot.py Normal file
View File

@ -0,0 +1,150 @@
"""一次性探测:微信 ClawBot 灰度是否覆盖某个微信号。
只做两件事(不碰 zcbot 主体不落库):
1. GET get_bot_qrcode 拿二维码 -> qr.png 并自动打开
2. 轮询 get_qrcode_status 等扫码确认 -> 报告 status
判读:
- 接口连不通 / 200 -> 本机到 ilinkai 网络不通,换网或在有网机器跑
- 出码成功手机扫得动确认 -> 该微信号在灰度内,ClawBot 可用
- 出码成功扫了报"不支持" -> 版本不够或未灰度到该号
ASCII-only 输出(Windows GBK 控制台)
"""
from __future__ import annotations
import base64
import os
import random
import sys
import time
import webbrowser
import httpx
BASE = "https://ilinkai.weixin.qq.com"
QR_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "clawbot_qr.png")
def _uin_header() -> str:
# X-WECHAT-UIN: base64(String(randomUint32()))
return base64.b64encode(str(random.randint(0, 2**32 - 1)).encode()).decode()
def _headers() -> dict:
return {
"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token",
"X-WECHAT-UIN": _uin_header(),
}
def _save_qr(img_content: str, qrcode_id: str) -> bool:
"""实测:qrcode_img_content 是微信深链(https://liteapp.weixin.qq.com/q/...),
需把该 URL **编码成二维码** 让微信扫,而非当图片下载
兜底:若哪天返回的是真图片字节(data-uri / base64 PNG)则直接存
"""
try:
if not img_content:
print(f"[hint] no img content; encode this id manually: {qrcode_id}")
return False
# 情况 A:真图片字节
if img_content.startswith("data:image"):
data = base64.b64decode(img_content.split(",", 1)[1])
with open(QR_PATH, "wb") as f:
f.write(data)
print(f"[ok] QR (image) saved -> {QR_PATH}")
return True
# 情况 B(实测):深链 / 任意字符串 -> 自己渲染成二维码
import segno
print(f"[info] encoding deep-link into QR: {img_content}")
segno.make(img_content, error="m").save(QR_PATH, scale=8, border=3)
print(f"[ok] QR (rendered from deep-link) saved -> {QR_PATH}")
return True
except Exception as e:
print(f"[warn] could not build QR: {type(e).__name__}: {e}")
print(f"[hint] deep-link to scan manually: {img_content}")
return False
def main() -> int:
print("[step1] GET get_bot_qrcode ...")
try:
with httpx.Client(timeout=20) as c:
r = c.get(
f"{BASE}/ilink/bot/get_bot_qrcode",
params={"bot_type": "3"},
headers=_headers(),
)
except Exception as e:
print(f"[FAIL] network error to {BASE}: {type(e).__name__}: {e}")
print("[judge] host cannot reach ilinkai.weixin.qq.com -> try another network.")
return 2
print(f"[http] status={r.status_code}")
body_preview = r.text[:600]
print(f"[body] {body_preview}")
if r.status_code != 200:
print("[judge] non-200 from get_bot_qrcode -> endpoint/params may be wrong or blocked.")
return 3
try:
data = r.json()
except Exception:
print("[FAIL] response not JSON; see body above.")
return 3
qrcode_id = data.get("qrcode") or data.get("qrcode_id") or ""
img = data.get("qrcode_img_content") or data.get("qrcode_img") or ""
if not qrcode_id:
print("[FAIL] no 'qrcode' field in response; field names differ -> inspect body above.")
return 3
if _save_qr(img, qrcode_id):
try:
webbrowser.open("file://" + QR_PATH.replace("\\", "/"))
except Exception:
pass
print("[action] QR opened. Scan it with your phone WeChat NOW.")
else:
print("[action] QR image unavailable; cannot open. See hint above.")
poll_secs = int(sys.argv[1]) if len(sys.argv) > 1 else 100
print(f"[step2] polling get_qrcode_status (up to ~{poll_secs}s; Ctrl-C to stop)...")
deadline = time.time() + poll_secs
last = ""
with httpx.Client(timeout=40) as c:
while time.time() < deadline:
try:
r = c.get(
f"{BASE}/ilink/bot/get_qrcode_status",
params={"qrcode": qrcode_id},
headers=_headers(),
)
st = ""
try:
st = (r.json() or {}).get("status", "")
except Exception:
st = f"(non-json http {r.status_code})"
if st != last:
print(f"[poll] status={st!r}")
last = st
if st == "confirmed":
j = r.json()
tok = j.get("bot_token", "")
base_url = j.get("baseurl") or j.get("base_url") or ""
masked = (tok[:6] + "..." + tok[-4:]) if len(tok) > 12 else "(short)"
print("[SUCCESS] scan confirmed -> this WeChat account IS in the ClawBot rollout.")
print(f"[SUCCESS] bot_token={masked} baseurl={base_url}")
print("[note] token masked on purpose; it is a per-user credential.")
return 0
except Exception as e:
print(f"[poll] error: {type(e).__name__}: {e}")
time.sleep(2)
print("[timeout] no confirmation within window. Either not scanned in time, or")
print("[timeout] your WeChat lacks the ClawBot entry (version <8.0.70 or not gray-rolled).")
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,181 @@
"""探测二:微信 ClawBot 的【对话】与【主动推送】能力(命门验证)。
流程(都在一次运行里,不落库):
1. 扫码绑定拿 bot_token(同探测一)
2. getupdates 长轮询,等你给微信 ClawBot联系人发一条消息
3. 收到后,依次测三种发送,逐一报 ret:
A. context_token 回复 -> 被动回复是否通
B. 25s ,同一个context_token 再发 -> 开口一次后能否延迟主动推
C. context_token 置空再发 -> 冷推( token)是否被拒
判读:
A = 双向对话成立
B = 用户开口一次后可后续推送(简报可走"先开口、后定时推"的弱化版)
C = 可冷推(几乎不可能,但要验)
B/C 都不通 = ClawBot 纯被动回复,定时主动推送这条路不成立
ASCII-only 输出bot_token 不打印
"""
from __future__ import annotations
import base64
import os
import random
import sys
import time
import httpx
import segno
BASE = "https://ilinkai.weixin.qq.com"
QR_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "clawbot_qr.png")
CHANNEL_VER = "1.0.2"
def _uin() -> str:
return base64.b64encode(str(random.randint(0, 2**32 - 1)).encode()).decode()
def _headers(token: str | None = None) -> dict:
h = {
"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token",
"X-WECHAT-UIN": _uin(),
}
if token:
h["Authorization"] = f"Bearer {token}"
return h
def bind() -> tuple[str, str] | None:
print("[bind] GET get_bot_qrcode ...")
with httpx.Client(timeout=20) as c:
r = c.get(f"{BASE}/ilink/bot/get_bot_qrcode",
params={"bot_type": "3"}, headers=_headers())
if r.status_code != 200:
print(f"[FAIL] get_bot_qrcode http {r.status_code}: {r.text[:300]}")
return None
d = r.json()
qid = d.get("qrcode", "")
link = d.get("qrcode_img_content", "")
segno.make(link, error="m").save(QR_PATH, scale=8, border=3)
try:
import webbrowser
webbrowser.open("file://" + QR_PATH.replace("\\", "/"))
except Exception:
pass
print(f"[bind] QR opened -> {QR_PATH} SCAN IT NOW with phone WeChat.")
deadline = time.time() + 180
with httpx.Client(timeout=40) as c:
last = ""
while time.time() < deadline:
try:
r = c.get(f"{BASE}/ilink/bot/get_qrcode_status",
params={"qrcode": qid}, headers=_headers())
j = r.json()
st = j.get("status", "")
if st != last:
print(f"[bind] status={st!r}")
last = st
if st == "confirmed":
print("[bind] confirmed.")
return j.get("bot_token", ""), (j.get("baseurl") or BASE)
if st == "expired":
print("[bind] QR expired before scan.")
return None
except Exception as e:
print(f"[bind] poll err: {type(e).__name__}: {e}")
time.sleep(2)
print("[bind] timeout waiting for scan.")
return None
def _send(client: httpx.Client, token: str, to_user: str, text: str,
context_token: str) -> dict:
body = {
"msg": {
"to_user_id": to_user,
"message_type": 2,
"message_state": 2,
"context_token": context_token,
"item_list": [{"type": 1, "text_item": {"text": text}}],
}
}
r = client.post(f"{BASE}/ilink/bot/sendmessage",
json=body, headers=_headers(token))
try:
return {"http": r.status_code, "json": r.json()}
except Exception:
return {"http": r.status_code, "text": r.text[:300]}
def main() -> int:
b = bind()
if not b:
return 2
token, base_url = b
global BASE
BASE = base_url or BASE
print("[chat] now SEND a message (e.g. 'hi') to the WeChat ClawBot contact on your phone.")
print("[chat] waiting via getupdates (up to ~150s)...")
buf = ""
deadline = time.time() + 150
got = None
with httpx.Client(timeout=40) as c:
while time.time() < deadline and got is None:
try:
r = c.post(f"{BASE}/ilink/bot/getupdates",
json={"get_updates_buf": buf,
"base_info": {"channel_version": CHANNEL_VER}},
headers=_headers(token))
j = r.json()
buf = j.get("get_updates_buf", buf)
for m in j.get("msgs", []) or []:
txt = ""
for it in m.get("item_list", []) or []:
txt += (it.get("text_item", {}) or {}).get("text", "")
print(f"[chat] <- from={m.get('from_user_id')} text={txt!r}")
got = m
break
except Exception as e:
print(f"[chat] getupdates err: {type(e).__name__}: {e}")
time.sleep(2)
if got is None:
print("[chat] no message received in window. Re-run and send promptly after scan.")
return 1
to_user = got.get("from_user_id", "")
ctx = got.get("context_token", "")
print(f"[chat] captured to_user={to_user} context_token_len={len(ctx)}")
with httpx.Client(timeout=30) as c:
print("\n[testA] reply WITH context_token ...")
ra = _send(c, token, to_user, "[zcbot 测试A] 收到你的消息,这是带 token 的回复。", ctx)
print(f"[testA] result={ra}")
print("\n[testB] wait 25s, then push again with the SAME context_token (delayed proactive)...")
time.sleep(25)
rb = _send(c, token, to_user, "[zcbot 测试B] 这是25秒后用同一token的延迟主动推送。", ctx)
print(f"[testB] result={rb}")
print("\n[testC] push with EMPTY context_token (cold push) ...")
rc = _send(c, token, to_user, "[zcbot 测试C] 这是空token的冷推送。", "")
print(f"[testC] result={rc}")
def ok(r):
j = r.get("json") or {}
return r.get("http") == 200 and j.get("ret", -1) == 0
print("\n========== VERDICT ==========")
print(f"A reply(with token) : {'OK' if ok(ra) else 'FAIL'}")
print(f"B delayed push(same token) : {'OK' if ok(rb) else 'FAIL'}")
print(f"C cold push(empty token) : {'OK' if ok(rc) else 'FAIL'}")
print("Interpretation:")
print(" - A only -> reply-only; scheduled PROACTIVE push NOT possible.")
print(" - A+B -> after user opens chat once, delayed push works (weak push OK).")
print(" - C -> true cold push works (unlikely).")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,158 @@
"""探测五(决定性):补上 client_id(每条唯一)+ base_info,重验两件事。
A. 流式多条:同一 context_token 连发 3 (client_id 各异,state 1/1/2,间隔300ms)
-> 三块都到 = 多条/长简报可行
B. finish 后复用:发完 FINISH,等30s,同一 context_token+ client_id 再发一条(state=2)
-> = context_token 24h 内可复用 -> "用户开口一次后可主动推" 成立(简报推送复活)
之前失败的最大嫌疑: client_id(后续块无法路由被丢)需要你发一条消息触发
ASCII-only,bot_token 不打印
"""
from __future__ import annotations
import base64
import os
import random
import sys
import time
import uuid
import httpx
import segno
BASE = "https://ilinkai.weixin.qq.com"
QR_DIR = os.path.dirname(os.path.abspath(__file__))
CHANNEL_VER = "1.0.2"
def _uin() -> str:
return base64.b64encode(str(random.randint(0, 2**32 - 1)).encode()).decode()
def _headers(token=None) -> dict:
h = {"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token", "X-WECHAT-UIN": _uin()}
if token:
h["Authorization"] = f"Bearer {token}"
return h
def _new_qr():
with httpx.Client(timeout=20) as c:
r = c.get(f"{BASE}/ilink/bot/get_bot_qrcode",
params={"bot_type": "3"}, headers=_headers())
if r.status_code != 200:
print(f"[FAIL] http {r.status_code}"); return None
d = r.json()
uniq = os.path.join(QR_DIR, f"clawbot_qr_{int(time.time())}.png")
segno.make(d.get("qrcode_img_content", ""), error="m").save(uniq, scale=8, border=3)
try:
os.startfile(uniq)
except Exception:
pass
print(f"[bind] FRESH QR -> {uniq}")
return d.get("qrcode", "")
def bind():
print("[bind] auto-refresh on expiry; scan whenever ready.")
qid = _new_qr()
if not qid:
return None
deadline = time.time() + 300
with httpx.Client(timeout=40) as c:
last = ""
while time.time() < deadline:
try:
j = c.get(f"{BASE}/ilink/bot/get_qrcode_status",
params={"qrcode": qid}, headers=_headers()).json()
st = j.get("status", "")
if st != last:
print(f"[bind] status={st!r}"); last = st
if st == "confirmed":
return j.get("bot_token", ""), (j.get("baseurl") or BASE)
if st == "expired":
nq = _new_qr()
if not nq:
return None
qid, last = nq, ""
except Exception as e:
print(f"[bind] err {e}")
time.sleep(2)
return None
def send(c, token, to_user, text, ctx, state, tag):
cid = uuid.uuid4().hex
body = {
"msg": {
"to_user_id": to_user,
"client_id": cid,
"message_type": 2,
"message_state": state,
"context_token": ctx,
"item_list": [{"type": 1, "text_item": {"text": text}}],
},
"base_info": {"channel_version": CHANNEL_VER},
}
r = c.post(f"{BASE}/ilink/bot/sendmessage", json=body, headers=_headers(token))
try:
j = r.json()
except Exception:
j = r.text[:160]
print(f"[send {tag}] state={state} client_id={cid[:8]} -> http={r.status_code} body={j}")
def wait_msg(c, token):
deadline = time.time() + 150
buf = ""
while time.time() < deadline:
try:
j = c.post(f"{BASE}/ilink/bot/getupdates",
json={"get_updates_buf": buf,
"base_info": {"channel_version": CHANNEL_VER}},
headers=_headers(token)).json()
buf = j.get("get_updates_buf", buf)
for m in j.get("msgs", []) or []:
txt = "".join((it.get("text_item", {}) or {}).get("text", "")
for it in m.get("item_list", []) or [])
print(f"[recv] <- {txt!r}")
return m
except Exception as e:
print(f"[recv] err {e}"); time.sleep(2)
return None
def main() -> int:
b = bind()
if not b:
return 2
token, base_url = b
global BASE
BASE = base_url or BASE
print("[bind] confirmed.\n[A] SEND one message now (e.g. 'go') ...")
with httpx.Client(timeout=30) as c:
m = wait_msg(c, token)
if not m:
print("no msg; abort."); return 1
to_user, ctx = m.get("from_user_id", ""), m.get("context_token", "")
print("[A] streaming 3 chunks WITH client_id (state 1,1,2, 300ms apart)...")
send(c, token, to_user, "[A1] client_id+流式第一段(state=1)", ctx, 1, "A1")
time.sleep(0.3)
send(c, token, to_user, "[A2] client_id+流式第二段(state=1)", ctx, 1, "A2")
time.sleep(0.3)
send(c, token, to_user, "[A3] client_id+末段(state=2 FINISH)", ctx, 2, "A3")
print("\n[B] wait 30s, then reuse SAME context_token + new client_id (state=2)...")
time.sleep(30)
send(c, token, to_user, "[B] finish后30秒,复用同token主动推(若到=24h可复用)", ctx, 2, "B")
print("\n========== CHECK YOUR PHONE ==========")
print("Report which arrived:")
print(" [A1]/[A2]/[A3] -> all three = multi-message/streaming OK (need client_id)")
print(" [B] -> arrived = token reusable after finish => PROACTIVE PUSH revives")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,223 @@
"""探测六:验证 ClawBot 能否发【文件附件】(照官方 @tencent-weixin/openclaw-weixin 协议复刻)。
流程(全诊断,每步打印):
绑定 -> 等你发一条消息( to_user + context_token) -> 造个小 txt ->
md5/随机aeskey(16B)/随机filekey(16B hex) -> AES-128-ECB+PKCS7 加密 ->
POST /ilink/bot/getuploadurl(打印完整返回,字段名不对可据此改) ->
POST 密文到 CDN header x-encrypted-param ->
sendmessage file_item(type=4) 引用 -> 看手机是否收到文件
字段依据(源码):MessageItemType.FILE=4 / UploadMediaType.FILE=3 / MessageState.FINISH=2,
aes_key = base64(aeskey.hex() ascii 字节)ASCII-only,bot_token 不打印
"""
from __future__ import annotations
import base64
import hashlib
import os
import random
import sys
import time
import uuid
from urllib.parse import quote
import httpx
import segno
from cryptography.hazmat.primitives import padding
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
BASE = "https://ilinkai.weixin.qq.com"
CDN_BASE_DEFAULT = "https://novac2c.cdn.weixin.qq.com/c2c"
QR_DIR = os.path.dirname(os.path.abspath(__file__))
CHANNEL_VER = "1.0.2"
def _uin() -> str:
return base64.b64encode(str(random.randint(0, 2**32 - 1)).encode()).decode()
def _headers(token=None) -> dict:
h = {"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token", "X-WECHAT-UIN": _uin()}
if token:
h["Authorization"] = f"Bearer {token}"
return h
def _new_qr():
with httpx.Client(timeout=20) as c:
r = c.get(f"{BASE}/ilink/bot/get_bot_qrcode",
params={"bot_type": "3"}, headers=_headers())
if r.status_code != 200:
print(f"[FAIL] http {r.status_code}"); return None
d = r.json()
uniq = os.path.join(QR_DIR, f"clawbot_qr_{int(time.time())}.png")
segno.make(d.get("qrcode_img_content", ""), error="m").save(uniq, scale=8, border=3)
try:
os.startfile(uniq)
except Exception:
pass
print(f"[bind] FRESH QR -> {uniq}")
return d.get("qrcode", "")
def bind():
print("[bind] auto-refresh on expiry; scan whenever ready.")
qid = _new_qr()
if not qid:
return None
deadline = time.time() + 300
with httpx.Client(timeout=40) as c:
last = ""
while time.time() < deadline:
try:
j = c.get(f"{BASE}/ilink/bot/get_qrcode_status",
params={"qrcode": qid}, headers=_headers()).json()
st = j.get("status", "")
if st != last:
print(f"[bind] status={st!r}"); last = st
if st == "confirmed":
return j.get("bot_token", ""), (j.get("baseurl") or BASE)
if st == "expired":
nq = _new_qr()
if not nq:
return None
qid, last = nq, ""
except Exception as e:
print(f"[bind] err {e}")
time.sleep(2)
return None
def wait_msg(c, token):
deadline = time.time() + 150
buf = ""
while time.time() < deadline:
try:
j = c.post(f"{BASE}/ilink/bot/getupdates",
json={"get_updates_buf": buf,
"base_info": {"channel_version": CHANNEL_VER}},
headers=_headers(token)).json()
buf = j.get("get_updates_buf", buf)
for m in j.get("msgs", []) or []:
txt = "".join((it.get("text_item", {}) or {}).get("text", "")
for it in m.get("item_list", []) or [])
print(f"[recv] <- {txt!r}")
return m
except Exception as e:
print(f"[recv] err {e}"); time.sleep(2)
return None
def aes_ecb_pkcs7(plain: bytes, key: bytes) -> bytes:
padder = padding.PKCS7(128).padder()
padded = padder.update(plain) + padder.finalize()
enc = Cipher(algorithms.AES(key), modes.ECB()).encryptor()
return enc.update(padded) + enc.finalize()
def main() -> int:
b = bind()
if not b:
return 2
token, base_url = b
global BASE
BASE = base_url or BASE
print("[bind] confirmed.\n[file] SEND one message now (e.g. 'file') ...")
with httpx.Client(timeout=30) as c:
m = wait_msg(c, token)
if not m:
print("no msg; abort."); return 1
to_user, ctx = m.get("from_user_id", ""), m.get("context_token", "")
# 1) 造测试文件
fpath = os.path.join(QR_DIR, "zcbot_filetest.txt")
with open(fpath, "w", encoding="utf-8") as f:
f.write("zcbot 文件发送测试\nClawBot file attachment probe\n" + "x" * 200)
data = open(fpath, "rb").read()
fname = "zcbot_filetest.txt"
rawsize = len(data)
rawmd5 = hashlib.md5(data).hexdigest()
aeskey = random.randbytes(16)
filekey = random.randbytes(16).hex()
cipher = aes_ecb_pkcs7(data, aeskey)
filesize = len(cipher)
print(f"[file] {fname} rawsize={rawsize} md5={rawmd5} filesize(enc)={filesize}")
# 2) getuploadurl
up_body = {
"filekey": filekey, "media_type": 3, "to_user_id": to_user,
"rawsize": rawsize, "rawfilemd5": rawmd5, "filesize": filesize,
"no_need_thumb": True, "aeskey": aeskey.hex(),
"base_info": {"channel_version": CHANNEL_VER},
}
ru = c.post(f"{BASE}/ilink/bot/getuploadurl", json=up_body, headers=_headers(token))
print(f"[getuploadurl] http={ru.status_code}")
try:
uj = ru.json()
except Exception:
print(f"[getuploadurl] non-json: {ru.text[:300]}"); return 3
print(f"[getuploadurl] resp={uj}")
# 3) 解析上传 URL(字段名不确定,多名兜底)
full = (uj.get("upload_full_url") or uj.get("uploadFullUrl")
or uj.get("full_url") or uj.get("url"))
param = (uj.get("upload_param") or uj.get("uploadParam") or uj.get("param"))
cdn_base = uj.get("cdn_base_url") or uj.get("cdnBaseUrl") or CDN_BASE_DEFAULT
if full:
cdn_url = full
elif param:
# 源码模板:?encrypted_query_param=<urlencode(uploadParam)>&filekey=<urlencode(filekey)>
cdn_url = (f"{cdn_base}/upload?encrypted_query_param={quote(param)}"
f"&filekey={quote(filekey)}")
else:
print("[FAIL] no upload url/param in resp; inspect resp above to fix field names.")
return 4
print(f"[upload] POST ciphertext -> {cdn_url[:120]}...")
# 4) 上传密文到 CDN
rc = c.post(cdn_url, content=cipher,
headers={"Content-Type": "application/octet-stream"})
download_param = rc.headers.get("x-encrypted-param")
print(f"[upload] http={rc.status_code} x-encrypted-param={download_param!r}")
if not download_param:
print(f"[upload] resp headers={dict(rc.headers)} body={rc.text[:200]}")
print("[FAIL] no x-encrypted-param returned; upload likely rejected.")
return 5
# 5) sendmessage 带 file_item
msg_body = {
"msg": {
"from_user_id": "", "to_user_id": to_user,
"client_id": f"openclaw-weixin-{uuid.uuid4().hex}",
"message_type": 2, "message_state": 2, "context_token": ctx,
"item_list": [{
"type": 4,
"file_item": {
"media": {
"encrypt_query_param": download_param,
"aes_key": base64.b64encode(aeskey.hex().encode()).decode(),
"encrypt_type": 1,
},
"file_name": fname,
"len": str(rawsize),
},
}],
},
"base_info": {"channel_version": CHANNEL_VER},
}
rs = c.post(f"{BASE}/ilink/bot/sendmessage", json=msg_body, headers=_headers(token))
try:
sj = rs.json()
except Exception:
sj = rs.text[:200]
print(f"[sendmessage file] http={rs.status_code} body={sj}")
print("\n========== CHECK YOUR PHONE ==========")
print(f"Did a file '{fname}' arrive in the WeChat ClawBot chat (openable)?")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,147 @@
"""探测四:验证 ClawBot 流式/多条回复(message_state 非 FINISH 是关键)。
上轮发现:message_state=2 = FINISH,"封口"本轮,故第二条被丢
本轮:同一 context_token 连发三段前两段 state=1(未结束),末段 state=2(FINISH),
看手机收到的形态:
- 三条独立气泡 AAA / BBB / CCC -> 支持多条独立消息
- 一条气泡里 AAABBBCCC(增长) -> 流式增量(delta),拼成一条
- 只剩 CCC -> 流式覆盖(cumulative,末值胜)
据此定长简报的发法需要你发一条消息触发bot_token 不打印ASCII-only
"""
from __future__ import annotations
import base64
import os
import random
import sys
import time
import httpx
import segno
BASE = "https://ilinkai.weixin.qq.com"
QR_DIR = os.path.dirname(os.path.abspath(__file__))
CHANNEL_VER = "1.0.2"
def _uin() -> str:
return base64.b64encode(str(random.randint(0, 2**32 - 1)).encode()).decode()
def _headers(token: str | None = None) -> dict:
h = {"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token", "X-WECHAT-UIN": _uin()}
if token:
h["Authorization"] = f"Bearer {token}"
return h
def _new_qr() -> str | None:
with httpx.Client(timeout=20) as c:
r = c.get(f"{BASE}/ilink/bot/get_bot_qrcode",
params={"bot_type": "3"}, headers=_headers())
if r.status_code != 200:
print(f"[FAIL] http {r.status_code}: {r.text[:200]}"); return None
d = r.json()
uniq = os.path.join(QR_DIR, f"clawbot_qr_{int(time.time())}.png")
segno.make(d.get("qrcode_img_content", ""), error="m").save(uniq, scale=8, border=3)
try:
os.startfile(uniq)
except Exception:
pass
print(f"[bind] FRESH QR -> {uniq}")
return d.get("qrcode", "")
def bind() -> tuple[str, str] | None:
print("[bind] auto-refresh on expiry; scan whenever ready.")
qid = _new_qr()
if not qid:
return None
deadline = time.time() + 300
with httpx.Client(timeout=40) as c:
last = ""
while time.time() < deadline:
try:
j = c.get(f"{BASE}/ilink/bot/get_qrcode_status",
params={"qrcode": qid}, headers=_headers()).json()
st = j.get("status", "")
if st != last:
print(f"[bind] status={st!r}"); last = st
if st == "confirmed":
return j.get("bot_token", ""), (j.get("baseurl") or BASE)
if st == "expired":
print("[bind] expired -> new QR");
nq = _new_qr()
if not nq:
return None
qid, last = nq, ""
continue
except Exception as e:
print(f"[bind] err {type(e).__name__}: {e}")
time.sleep(2)
return None
def send(c, token, to_user, text, ctx, state):
body = {"msg": {"to_user_id": to_user, "message_type": 2, "message_state": state,
"context_token": ctx,
"item_list": [{"type": 1, "text_item": {"text": text}}]}}
r = c.post(f"{BASE}/ilink/bot/sendmessage", json=body, headers=_headers(token))
try:
j = r.json()
except Exception:
j = r.text[:200]
print(f"[send] state={state} text={text!r} -> http={r.status_code} body={j}")
def wait_msg(c, token):
deadline = time.time() + 150
buf = ""
while time.time() < deadline:
try:
j = c.post(f"{BASE}/ilink/bot/getupdates",
json={"get_updates_buf": buf,
"base_info": {"channel_version": CHANNEL_VER}},
headers=_headers(token)).json()
buf = j.get("get_updates_buf", buf)
for m in j.get("msgs", []) or []:
txt = "".join((it.get("text_item", {}) or {}).get("text", "")
for it in m.get("item_list", []) or [])
print(f"[recv] <- {txt!r}")
return m
except Exception as e:
print(f"[recv] err {type(e).__name__}: {e}"); time.sleep(2)
return None
def main() -> int:
b = bind()
if not b:
return 2
token, base_url = b
global BASE
BASE = base_url or BASE
print("[bind] confirmed.\n[stream] SEND one message now (e.g. 'go') ...")
with httpx.Client(timeout=30) as c:
m = wait_msg(c, token)
if not m:
print("[stream] no msg; abort."); return 1
to_user, ctx = m.get("from_user_id", ""), m.get("context_token", "")
print("[stream] sending 3 parts with same token (state 1,1,2)...")
send(c, token, to_user, "AAA-第一段(state=1)", ctx, 1)
time.sleep(1)
send(c, token, to_user, "BBB-第二段(state=1)", ctx, 1)
time.sleep(1)
send(c, token, to_user, "CCC-第三段(state=2,FINISH)", ctx, 2)
print("\n========== CHECK YOUR PHONE ==========")
print("Which form did you get?")
print(" (a) three separate bubbles: AAA / BBB / CCC -> multi-message OK")
print(" (b) one bubble growing: AAABBBCCC -> streaming delta-append")
print(" (c) one bubble only: CCC -> streaming cumulative(last wins)")
print(" (d) only AAA / nothing else -> still single")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -0,0 +1,168 @@
"""探测三:钉死 ClawBot 的 context_token 语义(决定拉取式简报 + 长回复可行性)。
要回答两个问题:
T1 多发:一条用户消息收到后,同一个新鲜 token连发两条回复
-> 第二条到不到 = 能否分段/多条回复(长简报关键)
T2 延迟:第二条用户消息收到后,先不回, 25s,再用那条没用过的token 回一次
-> 到不到 = token 是否限时(能否把回复推迟一会儿)
需要你先后发两条消息微信 ClawBot(比如先发 1,再发 2)
结果以手机实收为准(接口返空 body 不可信)bot_token 不打印ASCII-only
"""
from __future__ import annotations
import base64
import os
import random
import sys
import time
import httpx
import segno
BASE = "https://ilinkai.weixin.qq.com"
QR_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "clawbot_qr.png")
CHANNEL_VER = "1.0.2"
def _uin() -> str:
return base64.b64encode(str(random.randint(0, 2**32 - 1)).encode()).decode()
def _headers(token: str | None = None) -> dict:
h = {"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token",
"X-WECHAT-UIN": _uin()}
if token:
h["Authorization"] = f"Bearer {token}"
return h
def _new_qr() -> str | None:
"""拉一张新二维码、弹窗,返回 qrcode id;失败返回 None。"""
with httpx.Client(timeout=20) as c:
r = c.get(f"{BASE}/ilink/bot/get_bot_qrcode",
params={"bot_type": "3"}, headers=_headers())
if r.status_code != 200:
print(f"[FAIL] get_bot_qrcode http {r.status_code}: {r.text[:200]}")
return None
d = r.json()
qid = d.get("qrcode", "")
uniq = os.path.join(os.path.dirname(QR_PATH), f"clawbot_qr_{int(time.time())}.png")
segno.make(d.get("qrcode_img_content", ""), error="m").save(uniq, scale=8, border=3)
try:
os.startfile(uniq)
except Exception:
try:
import webbrowser
webbrowser.open("file://" + uniq.replace("\\", "/"))
except Exception:
pass
print(f"[bind] FRESH QR -> {uniq} (older windows are stale, ignore them)")
return qid
def bind() -> tuple[str, str] | None:
"""过期自动换新码,直到扫成功或总超时(5min)。消除扫码时间竞争。"""
print("[bind] GET get_bot_qrcode ... (auto-refresh on expiry; scan whenever ready)")
qid = _new_qr()
if not qid:
return None
deadline = time.time() + 300
with httpx.Client(timeout=40) as c:
last = ""
while time.time() < deadline:
try:
j = c.get(f"{BASE}/ilink/bot/get_qrcode_status",
params={"qrcode": qid}, headers=_headers()).json()
st = j.get("status", "")
if st != last:
print(f"[bind] status={st!r}"); last = st
if st == "confirmed":
return j.get("bot_token", ""), (j.get("baseurl") or BASE)
if st == "expired":
print("[bind] QR expired -> generating a new one ...")
nq = _new_qr()
if not nq:
return None
qid, last = nq, ""
continue
except Exception as e:
print(f"[bind] err {type(e).__name__}: {e}")
time.sleep(2)
print("[bind] overall timeout (5min)."); return None
def send(c, token, to_user, text, ctx):
body = {"msg": {"to_user_id": to_user, "message_type": 2, "message_state": 2,
"context_token": ctx,
"item_list": [{"type": 1, "text_item": {"text": text}}]}}
r = c.post(f"{BASE}/ilink/bot/sendmessage", json=body, headers=_headers(token))
try:
return {"http": r.status_code, "json": r.json()}
except Exception:
return {"http": r.status_code, "text": r.text[:200]}
def wait_msg(c, token, buf):
"""阻塞等下一条用户消息,返回 (msg, new_buf)。"""
deadline = time.time() + 150
while time.time() < deadline:
try:
j = c.post(f"{BASE}/ilink/bot/getupdates",
json={"get_updates_buf": buf,
"base_info": {"channel_version": CHANNEL_VER}},
headers=_headers(token)).json()
buf = j.get("get_updates_buf", buf)
for m in j.get("msgs", []) or []:
txt = "".join((it.get("text_item", {}) or {}).get("text", "")
for it in m.get("item_list", []) or [])
print(f"[recv] <- {txt!r}")
return m, buf
except Exception as e:
print(f"[recv] err {type(e).__name__}: {e}"); time.sleep(2)
return None, buf
def main() -> int:
b = bind()
if not b:
return 2
token, base_url = b
global BASE
BASE = base_url or BASE
print("[bind] confirmed.\n")
with httpx.Client(timeout=40) as c:
# ---- T1: 同一 token 连发两条 ----
print("[T1] SEND your 1st message now (e.g. '1') ...")
m, buf = wait_msg(c, token, "")
if not m:
print("[T1] no msg; abort."); return 1
to_user, ctx = m.get("from_user_id", ""), m.get("context_token", "")
r1a = send(c, token, to_user, "[T1-a] 同token第一条(立即)", ctx)
r1b = send(c, token, to_user, "[T1-b] 同token第二条(紧接)", ctx)
print(f"[T1] sent two with same token. http: a={r1a.get('http')} b={r1b.get('http')}")
# ---- T2: 收到后不回,延迟 25s 再用未用过的 token 回一次 ----
print("\n[T2] SEND your 2nd message now (e.g. '2') ...")
m2, buf = wait_msg(c, token, buf)
if not m2:
print("[T2] no msg; skip.");
else:
to_user2, ctx2 = m2.get("from_user_id", ""), m2.get("context_token", "")
print("[T2] received; NOT replying; waiting 25s...")
time.sleep(25)
r2 = send(c, token, to_user2, "[T2] 延迟25秒,未用过的token回复", ctx2)
print(f"[T2] sent after delay. http={r2.get('http')}")
print("\n========== CHECK YOUR PHONE ==========")
print("Report which of these arrived in the WeChat ClawBot chat:")
print(" [T1-a] 同token第一条(立即)")
print(" [T1-b] 同token第二条(紧接) <- if arrives: multi-message per turn OK")
print(" [T2] 延迟25秒,未用过的token回复 <- if arrives: token is time-windowed, deferred reply OK")
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@ -23,9 +23,9 @@ description: 生成科研方向简报(research direction briefing / 重要文献
## 资源(路径相对 `load_skill` 头里的 `dir=<绝对路径>`) ## 资源(路径相对 `load_skill` 头里的 `dir=<绝对路径>`)
- `references/journals.md` —— 各建材子领域主流期刊清单(Elsevier 数据库优先)+ 精确 `publication_name` + 0 命中降级法。**阶段二必读**。 - `references/journals.md` —— 各建材子领域主流期刊清单(Elsevier 数据库优先)+ 精确 `publication_name` + 0 命中降级法。**阶段二必读**。
- `scripts/render_docx.py` —— md→docx,商务红主题 + 列表 `[n]` 锚点 + 正文 `[n]`/`[Wn]` 引文上标回链 + DOI/URL 可点超链 + 化学式下标白名单(CO2/C3S/Na2O...,不误伤 LC3/C595/Ca2+)。用 `.venv/Scripts/python.exe` 跑。 - **平台渲染层 `/sandbox/rendering/render.py`**(各 skill 通用,不再自带 render 脚本)—— `--profile brief --format docx|pdf`。docx:商务红主题 + 列表 `[n]` 锚点 + 正文 `[n]`/`[Wn]` 引文上标回链 + DOI/URL 超链 + 化学式下标白名单(CO2/C3S/Na2O...,不误伤 LC3/C595/Ca2+);pdf:沙盒自带 chromium 渲染(`md→HTML→chromium`),同套主题 + DOI/URL 超链 + 化学式下标。**渲染一律调它,禁止自己手搓 HTML / pip 装 weasyprint。**
产物默认 `.md`;要 docx 用 render_docx.py;要 deck 转 `ppt` skill。 产物默认 `.md`;要 docx/pdf 调 `render.py --profile brief`;要 deck 转 `ppt` skill。
## 阶段一:定题对齐(BLOCKING) ## 阶段一:定题对齐(BLOCKING)
@ -62,7 +62,9 @@ for jname in ["Cement and Concrete Research", "Cement and Concrete Composites",
- 汇成证据表 `<task_dir>/evidence.md`:期刊 | 标题 | 第一作者(机构)| 年-月 | 摘要概述 | DOI | 来源(research/documents/web)。 - 汇成证据表 `<task_dir>/evidence.md`:期刊 | 标题 | 第一作者(机构)| 年-月 | 摘要概述 | DOI | 来源(research/documents/web)。
- 跨源去重:同 DOI 一条(documents 全文优先,DOI 记自 research);web 不与论文去重、单列。 - 跨源去重:同 DOI 一条(documents 全文优先,DOI 记自 research);web 不与论文去重、单列。
> **库时效(必交代)**:research(OpenAlex)约 3 个月索引滞后,"最新"= 库内最新。窗口内 0 篇 → 如实告知库未收录该窗口,可用 web 补更近的非论文动向,**不脑补文献**。 > **context 纪律(省时省钱,务必遵守)**:检索结果(尤其全文 abstract)**落进 `evidence.md` / `selected_papers.json` 文件**,**不要在对话里反复 `run_python`/`print` 把整批 abstract 灌进上下文**。工具输出会永久留在 context 并每轮重发——同一批摘要 dump 三次,context 就滚成雪球(实测一次简报因此累计烧 2.5M 输入 token、跑满超时被掐断)。需要看某几篇时按需 `read` 文件片段,看完即弃,别整批重打。
> **窗口内 0 篇**:如实告知库内该窗口暂无收录(可能该刊本窗口尚未发文),可用 web 补更近的非论文动向,**不脑补文献**。
## 阶段三:列清单 + 内容总结(写 `<task_dir>/sections/*.md`) ## 阶段三:列清单 + 内容总结(写 `<task_dir>/sections/*.md`)
@ -78,6 +80,8 @@ for jname in ["Cement and Concrete Research", "Cement and Concrete Composites",
<简介/摘要概述:24 ,讲研究对象方法/表征主要发现与关键数据 基于 abstract 或全文,不夸张不评判> <简介/摘要概述:24 ,讲研究对象方法/表征主要发现与关键数据 基于 abstract 或全文,不夸张不评判>
``` ```
`publication_date` 倒序,最新在前。每篇都要有摘要概述,不能只留标题。 `publication_date` 倒序,最新在前。每篇都要有摘要概述,不能只留标题。
> **一次成稿,别重复 dump**:中文概述基于 `evidence.md` / `selected_papers.json` **一遍生成写入**,生成后**不要再把英文 abstract 重新 `print` 进上下文**(它已在文件里)。论文多时按期刊**分批写**(每个 `###` 期刊段一次 `write`/`edit`),避免单次超长输出拖慢——而不是先把全批 abstract 全打印出来再憋一个巨型 write。
- **`02_summary.md` 内容总结**:对这批论文**客观归纳**——主题分布、常涉材料体系、常用方法/表征、共同关注点;引具体论文挂 `[n]` 上标(回链到 01)。**只描述"这批论文在讲什么",不给"应当/建议/可切入"**。 - **`02_summary.md` 内容总结**:对这批论文**客观归纳**——主题分布、常涉材料体系、常用方法/表征、共同关注点;引具体论文挂 `[n]` 上标(回链到 01)。**只描述"这批论文在讲什么",不给"应当/建议/可切入"**。
- **`03_web.md` 其他动向(仅 spec 开 web 时)**:政策/标准/会议/产业,`[W1]` 标来源 + 日期,单列。 - **`03_web.md` 其他动向(仅 spec 开 web 时)**:政策/标准/会议/产业,`[W1]` 标来源 + 日期,单列。
@ -95,7 +99,8 @@ for jname in ["Cement and Concrete Research", "Cement and Concrete Composites",
## 阶段五:渲染验收 ## 阶段五:渲染验收
- 用户要 docx → `.venv/Scripts/python.exe <dir>/scripts/render_docx.py <sections_dir> -o <方向>-简报.docx`(`--no-color` 出黑白);要 deck → 转 ppt。 - 用户要 docx → `python /sandbox/rendering/render.py --profile brief --format docx <sections_dir> -o <方向>-简报.docx`(`--no-color` 出黑白);要 deck → 转 ppt。
- 用户要 pdf → `python /sandbox/rendering/render.py --profile brief --format pdf <sections_dir> -o <方向>-简报.pdf`(沙盒内 chromium 渲染,同样 `--no-color` 出黑白)。**别现搓 weasyprint / 现 pip 装包** —— 直接调 render.py。
- 渲染前自查:`[CITE-]`/`<TODO>` 占位是否清干净、正文 `[n]` 与列表 `[n]` 是否对得上(无 orphan)、有没有混进"建议/启示/本院应当"措辞。 - 渲染前自查:`[CITE-]`/`<TODO>` 占位是否清干净、正文 `[n]` 与列表 `[n]` 是否对得上(无 orphan)、有没有混进"建议/启示/本院应当"措辞。
- 交付一句话说清:覆盖了哪些期刊、收了多少篇、时间窗、哪些刊本窗口库内无收录。 - 交付一句话说清:覆盖了哪些期刊、收了多少篇、时间窗、哪些刊本窗口库内无收录。
@ -106,3 +111,4 @@ for jname in ["Cement and Concrete Research", "Cement and Concrete Composites",
- ❌ 跳过定题直接检索 / 用中文 keyword 搜英文库 / 期刊名不精确 —— 先定题、转英文术语、用精确 `publication_name` - ❌ 跳过定题直接检索 / 用中文 keyword 搜英文库 / 期刊名不精确 —— 先定题、转英文术语、用精确 `publication_name`
- ❌ web 资讯混进论文列表/总结 —— 单列"其他动向" - ❌ web 资讯混进论文列表/总结 —— 单列"其他动向"
- ❌ 编造 DOI / "据报道"无源句 —— 查不到就如实说 - ❌ 编造 DOI / "据报道"无源句 —— 查不到就如实说
- ❌ 反复 `run_python`/`print` 把整批全文 abstract 灌进上下文 —— 落文件、按需读;同批摘要 dump 多次会让 context 滚雪球(实测一次简报累计烧 2.5M token、跑满超时被掐断没推送出去)

View File

@ -41,7 +41,7 @@ description: 撰写学术期刊投稿论文(中文核心 / 英文 SCI;原创研
**脚本**(`.venv/Scripts/python.exe <skill_dir>/scripts/...`): **脚本**(`.venv/Scripts/python.exe <skill_dir>/scripts/...`):
- `scripts/render_diagrams.py` —— sections/*.md 的 ```mermaid``` 块 → `figures/fig_<caption>.png`(caption 必填+唯一) - `scripts/render_diagrams.py` —— sections/*.md 的 ```mermaid``` 块 → `figures/fig_<caption>.png`(caption 必填+唯一)
- `scripts/render_docx.py` —— md→docx,`--lang {zh,en}`(图题 图/Fig.),`--toc`(默认不出目录),自动 `**bold**`/列表/表格/`![](png)` 居中插图 + 图题自增 - **平台渲染层 `/sandbox/rendering/render.py --profile paper`**(不再自带 render_docx)—— md→docx,`--lang {zh,en}`(图题 图/Fig.),`--toc`(默认不出目录),自动 `**bold**`/列表/表格/`![](png)` 居中插图 + 图题自增;要 pdf 加 `--format pdf`。**渲染一律调它,别自己手搓。**
- `scripts/word_count.py` —— `--type --lang`,章节篇幅 vs 预算 - `scripts/word_count.py` —— `--type --lang`,章节篇幅 vs 预算
- `scripts/quality_check.py` —— `--type`,结构/占位符/过度宣称/插图 + **引文交叉核对**(orphan/uncited/编号连续) - `scripts/quality_check.py` —— `--type`,结构/占位符/过度宣称/插图 + **引文交叉核对**(orphan/uncited/编号连续)
@ -149,7 +149,7 @@ spec 定下「类型 + 语言」后,**按 §资源 条件加载**对应的 cite_
python <skill_dir>/scripts/word_count.py <task_dir>/sections/ --type original --lang en python <skill_dir>/scripts/word_count.py <task_dir>/sections/ --type original --lang en
python <skill_dir>/scripts/quality_check.py <task_dir>/sections/ --type original python <skill_dir>/scripts/quality_check.py <task_dir>/sections/ --type original
python <skill_dir>/scripts/render_diagrams.py <task_dir>/sections/ # 有 ```mermaid 块就跑 python <skill_dir>/scripts/render_diagrams.py <task_dir>/sections/ # 有 ```mermaid 块就跑
python <skill_dir>/scripts/render_docx.py <task_dir>/sections/ --lang en -o <task_dir>/<topic>.docx python /sandbox/rendering/render.py --profile paper --format docx <task_dir>/sections/ --lang en -o <task_dir>/<topic>.docx
``` ```
- `quality_check` 的 orphan/uncited/占位符不通过 → 回头改章节或补阶段五核验,再跑 - `quality_check` 的 orphan/uncited/占位符不通过 → 回头改章节或补阶段五核验,再跑

View File

@ -17,7 +17,7 @@ description: 撰写中国发明专利技术交底书 (供专利代理师转写
- `<skill_dir>/references/self_check.md` —— 渲染前自查清单(参数/公式一致、逻辑闭环、脱敏、附图) - `<skill_dir>/references/self_check.md` —— 渲染前自查清单(参数/公式一致、逻辑闭环、脱敏、附图)
- `<skill_dir>/templates/spec.md` —— task 级"宪法"模板(案件名 / 技术领域 / 创新点清单 / 检索结论 / 脱敏边界 / 附图清单) - `<skill_dir>/templates/spec.md` —— task 级"宪法"模板(案件名 / 技术领域 / 创新点清单 / 检索结论 / 脱敏边界 / 附图清单)
- `<skill_dir>/templates/disclosure.md` —— 交底书 7 章 Markdown 模板,阶段四照抄 - `<skill_dir>/templates/disclosure.md` —— 交底书 7 章 Markdown 模板,阶段四照抄
- **渲染脚本复用 proposal skill**:`skills/proposal/scripts/render_diagrams.py` + `render_docx.py` —— 跟交底书 md 兼容(同样的 markdown + ```mermaid``` + `%% caption:` 约定),不另写 - **渲染复用平台层 + proposal 图脚本**:docx 调 `rendering/render.py --profile proposal`(见下);mermaid 图仍用 `skills/proposal/scripts/render_diagrams.py` 预渲染 `figures/fig_<caption>.png` —— 同样的 markdown + ```mermaid``` + `%% caption:` 约定,不另写
## 阶段零: 摄取素材 (有 PDF/DOCX/PPTX/XLSX/URL 时才走) ## 阶段零: 摄取素材 (有 PDF/DOCX/PPTX/XLSX/URL 时才走)
@ -130,8 +130,8 @@ read <skill_dir>/references/self_check.md
# 2. mermaid 附图预渲染 (章节有 ```mermaid``` 块就跑) # 2. mermaid 附图预渲染 (章节有 ```mermaid``` 块就跑)
python <skill_dir>/../proposal/scripts/render_diagrams.py <task_dir>/sections/ python <skill_dir>/../proposal/scripts/render_diagrams.py <task_dir>/sections/
# 3. 渲染 .docx (复用 proposal skill 的脚本,patent 不另写) # 3. 渲染 .docx (调平台渲染层,复用 proposal profile)
python <skill_dir>/../proposal/scripts/render_docx.py <task_dir>/sections/ --fund-type key_rd -o <task_dir>/<案件名>_技术交底书.docx python /sandbox/rendering/render.py --profile proposal --format docx <task_dir>/sections/ --fund-type key_rd -o <task_dir>/<案件名>_技术交底书.docx
``` ```
> `render_docx.py``--fund-type` 只影响目录页表头文案与封面,不影响章节解析 —— 交底书复用 `key_rd` 排版规范(国标黑体/宋体/1.5 倍行距)。封面页用户拿到后手动改成"技术交底书"标题,或在 sections/00_封面.md 自定义。 > `render_docx.py``--fund-type` 只影响目录页表头文案与封面,不影响章节解析 —— 交底书复用 `key_rd` 排版规范(国标黑体/宋体/1.5 倍行距)。封面页用户拿到后手动改成"技术交底书"标题,或在 sections/00_封面.md 自定义。

28
skills/ppt/ATTRIBUTION.md Normal file
View File

@ -0,0 +1,28 @@
# 第三方来源与许可 (Attribution)
本 skill 的 SVG→PPTX 引擎、设计知识 references、模板与图标库**移植自开源项目 ppt-master**,并适配 zcbot 的 task_dir / 聊天确认 / imagegen 工作流。
## ppt-master
- 仓库:https://github.com/hugohe3/ppt-master
- 许可:MIT License
- 作者:Hugo He
- 移植范围(范围 B):
- **引擎**:`scripts/svg_to_pptx/`、`scripts/svg_finalize/`、`svg_quality_checker.py`、`finalize_svg.py`、`svg_to_pptx.py`、`total_md_split.py`、`update_spec.py`、`project_utils.py`、`error_helper.py`
- **设计知识**:`references/`(shared-standards / executor-base / strategist / image-layout-* / canvas-formats / modes / visual-styles / animations)
- **模板库**:`templates/`(layouts / decks / brands / charts / icons + spec 骨架)
- **未移植**:浏览器 Confirm UI、live preview server、TTS 配音子系统、AI 配图/网图子系统(zcbot 走自己的 imagegen skill)。
- zcbot 侧改动:`SKILL.md` 重写为两阶段聊天确认流;新增 `svg_preview.py`(无头 Chrome 渲 SVG→PNG 验收);入口脚本加 Windows GBK 控制台兼容 shim。
## 图标库 (templates/icons/)
各图标集沿用其上游许可,商用前以上游为准:
| 库 | 上游 | 许可 |
|---|---|---|
| tabler-outline / tabler-filled | Tabler Icons | MIT |
| phosphor-duotone | Phosphor Icons | MIT |
| simple-icons | Simple Icons | CC0 1.0(品牌标识版权归各品牌方,仅按其品牌规范使用) |
| chunk-filled | 见 templates/icons/README.md | 见上游 |
详见 `templates/icons/README.md`

View File

@ -3,228 +3,227 @@ name: ppt
description: 生成 PowerPoint 演示文稿 (.pptx) 文件。✅ 触发:用户明确点名 PPT / 幻灯片 / 演示文稿 / .pptx / slide / deck 之一。⛔ 不触发:用户明确说要"报告 / 文档 / 纪要"等指向纯文档形式的产物。⚠️ 歧义先反问:用户说"汇报 / 方案 / 材料"等产物形态不明的词、且没说成品形式时,不要直接 load 本 skill 也不要假定走文档,先反问一句"这份要做成 PPT 演示稿,还是 Word/Markdown 文档?" 用户确认 PPT 后再 load。 description: 生成 PowerPoint 演示文稿 (.pptx) 文件。✅ 触发:用户明确点名 PPT / 幻灯片 / 演示文稿 / .pptx / slide / deck 之一。⛔ 不触发:用户明确说要"报告 / 文档 / 纪要"等指向纯文档形式的产物。⚠️ 歧义先反问:用户说"汇报 / 方案 / 材料"等产物形态不明的词、且没说成品形式时,不要直接 load 本 skill 也不要假定走文档,先反问一句"这份要做成 PPT 演示稿,还是 Word/Markdown 文档?" 用户确认 PPT 后再 load。
--- ---
# PPT # PPT(SVG-first)
把材料变成可演示的 .pptx。**先定调(spec + 逐页大纲),再出稿(一个脚本建整 deck),再验收(quality_check)** —— 方向在大纲阶段对齐,不在逐页阶段反复来回。 把材料变成**可演示、可编辑**的 .pptx。
进度展示建议:多页 deck 任务用 `task_progress` 标记「摄取素材 / 八条对齐 + 逐页大纲 / 图标预取 / 脚本建 deck / 质量检查 / 交付」等关键阶段;不要把每一页的内部写入都作为进度步骤。 **核心管线**:`素材 → 策略(spec)→ [配图] → 执行(逐页手写 SVG)→ SVG 质检 → 后处理 → 渲图验收 → 导出 PPTX`(验收在导出**之前**;导出边界有硬门,没验收过的 deck 拒绝产出 pptx)
> **为什么是 SVG**:不再用 python-pptx 拼固定版式件(那是版面单调/AI 味的天花板)。AI 把每页当**矢量设计稿手写成 SVG**(设计自由度 = 浏览器级),再由纯 Python 转换器逐元素译成**原生可编辑的 DrawingML**(形状/文本/渐变都能在 PowerPoint 里选中改)。SVG 与 DrawingML 是同一套"绝对坐标 2D 矢量"世界观的两种方言,转换是翻译而非格式硬凑。详见 `references/shared-standards.md`
> 进度展示:多页 deck 用 `task_progress` 标记「摄取素材 / 八条对齐 + 逐页大纲 / [配图] / 逐页 SVG / 质检 / 渲图验收 / 导出」等关键阶段;不要把每页内部写入都当进度步骤。
## 资源 ## 资源
- `scripts/pptx_helpers.py` —— **卡片式视觉工具箱模块**:配色/字体常量 + 派生明暗色阶(`PRIMARY_WASH/SOFT/DARK`)+ 语义色 `GOOD/BAD` + `new_presentation`/`set_palette` + **组合版式件**(一个函数摆一整块):`add_card_grid`(均衡网格)/`add_timeline`(时间轴)/`add_cycle`(流程闭环)/`add_toc`(目录)/`add_kpi`(数字卡,带 baseline+delta)/`add_takeaway`(结论框)/`add_source`(数据来源)+ 质感件 `add_card`(圆角卡,**默认平卡**)/`add_gradient_rect`/`add_icon_tile`/`add_pill`/`add_eyebrow`/`add_picture_bg`(混合背景)+ `add_notes`(演讲者备注)+ 基础件 `add_textbox`/`page_title`/`apply_brand`。`import pptx_helpers as P` 调用,**不默写源码**。⚠️ helper 的 `name=` 会写进形状名,quality_check 靠它判标签/bullet
- `references/design_principles.md` —— **§信息设计纪律(论断标题/Takeaway/数据语境化/page_rhythm)** + 画布/字号/配色/投影克制/字数预算等硬规则。**先读这节**
- `references/layouts.md` —— 13+ 版式与组合件调用示例 + helper API 速查 + 安全区保护
- `references/icons.md` —— 业务图标两层:Iconify (在线/本地缓存) / unicode 字形兜底
- `assets/icons/` —— **只读**种子图标库 (商务红 tabler 集,见 `INDEX.md`;新拉的图标写 `<task_dir>/assets/icons/`)
- 素材摄取: 用 `markitdown` CLI 把 PDF/DOCX/PPTX/XLSX/HTML/URL 转干净 Markdown,落到 `<task_dir>/source/<name>.md`
- `scripts/fetch_icon.py` —— 从 Iconify CDN 拉 SVG/PNG (染主题色;**PNG 转换需 cairosvg/svglib,没装会只出 SVG** —— 优先用种子库现成 PNG)
- `scripts/render_icon.py` —— unicode 字形 → 透明 PNG (Iconify 没有时兜底)
- `scripts/render_bg.py` —— 无头 Chrome 把主题化 HTML 渲成**杂志级背景 PNG**(混合方案:封面/章节背景图 + 其上原生可编辑文字)
- `scripts/pptx_preview.py` —— **把 .pptx 渲成 PNG 预览**(无头 Chrome),交付前**肉眼验收版面**(quality_check 查结构,预览查观感;能抓到多行不上色这类渲染 bug)
- `scripts/quality_check.py` —— 产物 .pptx 结构验收 (越界 / 文本溢出 / 按列 bullet / 按色系三色制 / 重叠)
## 默认主题 — 商务红 (硬约束) **脚本**(host 上用 `.venv/Scripts/python.exe <skill_dir>/scripts/xxx.py ...` 跑;`<skill_dir>` = 本 skill 绝对路径):
- `svg_quality_checker.py` —— **SVG 结构质检**(禁用特性 / viewBox / spec_lock 漂移 / 配色越界等)。引擎,自包含
- `finalize_svg.py` —— **SVG 后处理**(图标内嵌 / 配图裁切内嵌 / tspan 展平 / 圆角矩形转 path)→ 产出 `.build/svg_final/`(隐藏、可再生)
- `svg_to_pptx.py` —— **SVG → 原生 PPTX**(逐元素译 DrawingML;默认嵌演讲者备注 + Office 兼容 PNG 兜底)
- `total_md_split.py` —— 把 `notes/total.md` 拆成逐页备注(导出前跑)
- `update_spec.py` —— 改 `spec_lock.md` 的颜色/字体后,**一键传播到所有已生成 SVG**(改稿用)
- `svg_preview.py` —— **无头 Chrome 把 SVG 渲成 PNG** 供肉眼/vision 验收(SVG 是视觉真相;**替代**了浏览器 live preview);渲 project 目录时同步登记 `.build/acceptance.json` 验收记录(每页源 sha1 + verdict)
- `accept_pages.py` —— 看完 PNG 后**标记每页验收结论**(`--pass`/`--pass-all`/`--fail --reason`);标 pass 要求"渲过图 + 渲后源没改",导出 gate 只认 pass 页
- `project_utils.py` / `error_helper.py` —— 引擎辅助(canvas 校验 / 友好报错),被上面脚本 import,不直接调
**主色 `#C00000` / 辅色 `#E15554` / 强调色 `#FFC107`。** **设计知识(references/,先读相关的,不默写)**:
- `shared-standards.md` —— **SVG→PPT 硬约束(禁用特性清单 / XML 良构陷阱 / 字体栈纪律)**,执行前**必读**
- `executor-base.md` —— 执行通则(模板继承 / 逐页 spec_lock 重读 / 字号纪律 / 内容→版式)
- `strategist.md` —— 策略通则(八条对齐内容 / 配色派生 / 字号阶 §g / 配图意图 §h / spec 产出);**注:其中"Confirm UI 浏览器确认页"机制在 zcbot 里用聊天确认替代,只取其设计判断**
- `image-layout-patterns.md` / `image-layout-spec.md` / `svg-image-embedding.md` —— 图文版式 72 式 + 并排尺寸算法 + 配图嵌入规范
- `canvas-formats.md` —— 画布格式(viewBox / 安全区)
- `modes/`(5 种叙事骨架:pyramid/narrative/instructional/showcase/briefing)+ `visual-styles/`(**19 种视觉风格**:editorial/swiss-minimal/glassmorphism/dark-tech/data-journalism/…)—— **去 AI 味的关键**,执行时按 spec 锁定的那一个读
- `animations.md` —— 导出动画(可选,默认只翻页淡入、无逐元素动画)
**不允许擅自换色**。除非满足以下任一条件,否则 spec 必须填这套红色: **模板库(templates/,opt-in,默认自由设计不读)**:
- 用户在请求里**明确**点名其它配色 (例:"做成蓝色"、"用我们公司的紫色") - `layouts/`(版式模板)/ `decks/`(整套替换:中汽研/招商银行/重庆大学等)/ `brands/`(品牌身份)/ `charts/`(71 个图表/信息图 SVG)—— 索引见各自 `*_index.json`
- 用户提供素材里有明确的 brand guideline / 配色卡 - `icons/` —— **5 套图标库**(tabler-outline/tabler-filled/chunk-filled/phosphor-duotone/simple-icons,共 1.1w+)。executor 写 `<use data-icon="<lib>/<name>">`,finalize 自动从这里内嵌(默认目录,无需预取);锁 inventory 前用 `ls templates/icons/<lib>/ | grep <关键词>` 验名
- `design_spec_reference.md` / `spec_lock_reference.md` —— **spec 产出骨架**,策略阶段写 spec 前必读
**禁止的自我合理化**(都属违规): **素材摄取**:用 `markitdown` CLI 把 PDF/DOCX/PPTX/XLSX/HTML/URL 转 Markdown,落 `<project_dir>/sources/<name>.md`
- "这个场景蓝色更专业" / "学术汇报红色不合适" / "财务用蓝更稳重"
- "我觉得 XX 主题更适合"
要换色,**先问用户**,不要在 spec 里塞自己的偏好。其它备选见 `design_principles.md §2`
## 两阶段工作流
### 阶段一: 策略 (Strategist) — 八条对齐
产物:**task 级 spec 文件** —— 整个 deck 的"宪法",阶段二每页前都要重读。文件路径按 system prompt 的《task 级「宪法」文件命名约定》:
<task_dir>/<today>-<task_short_id>-<task_name>.spec.md
`<today>` / `<task_short_id>` / `<task_name>` 用 system prompt 注入的实际值替换。
**0. 先检测已有 spec**:
```
glob <task_dir>/*-<task_short_id>-*.spec.md → 按文件名字典序排,取最大者作 current
```
(按 short_id 主锚,name 部分不参与匹配 — 用户改过 task name 时旧文件仍能定位)
- 有 current(当前 task 已有 spec) → 展示给用户,问「**沿用进阶段二** / **重定调**(以 today 写新版,旧版保留)」,⛔ BLOCKING 等用户决定
- 仅有其它 task 的(`*-<别的 short_id>-*.spec.md`)→ 不当 current 用,继续走下面流程
- 完全没有 → 直接走下面流程
按下表**一次性给出推荐方案**,然后 ⛔ **BLOCKING:等用户确认/修改后才能进阶段二**。不要一条一条问。
| # | 项 | 默认值 |
|---|----|-------|
| 1 | 画布 | **16:9** (13.33×7.5 in) |
| 2 | 页数 | **封面 + 5-8 页正文 + 尾页(Q&A)** = 共 7-10 页。**封面 / 尾页强制必有**,不在 5-8 页预算里 |
| 3 | 受众 | 看材料推断:领导汇报 / 同行评审 / 客户 pitch |
| 4 | 风格 | **现代简约** (白底 + 细线 + 留白) |
| 5 | 配色 | **商务红** `#C00000` `#E15554` `#FFC107` (见上"默认主题") |
| 6 | 字体 | **微软雅黑 + Arial** |
| 7 | 图标 | **Iconify `tabler` 集** (描边商务图标,主色染色;`fetch_icon.py` 拉到 `<task_dir>/assets/icons/`;业务概念页用 `add_icon_tile` 配图标底块) |
| 8 | 图表 / 配图 | 数据 ≥ 3 个点 → matplotlib 图(或 ≤4 个数字直接上 KPI 卡 L10);**真实配图 opt-in**:封面/章节/图片页可走 imagegen 生图(**每张 ¥0.22**,默认不开,要用在大纲里标 `[img]` 并经用户确认) |
把这 8 项写进上面那个 task 级 spec 文件,以表格形式给用户预览,问一句"按这个开干?"。**spec 写定后不再改**(要改就走 §0 的「重定调」分支,以 today 为前缀写新版,旧版保留)。
**8 项之外,spec 还要含一张「逐页大纲」表** —— 阶段二一个脚本建整 deck 的输入,也是替代"逐页确认"的前置 checkpoint。**标题写论断、每页标节奏**(见 design_principles §信息设计纪律):
| 页 | 节奏 | 版式 | **论断式标题** | 核心信息 / Takeaway | 图标 / 图表 / 配图 |
|---|---|---|---|---|---|
| 1 | anchor | L1 封面 | <主标题> | <副标题 / 定位> | 可选 `[img]` 主图 |
| 2 | anchor | 目录 | 目录 | <5 + 各一句副标> | — |
| 3 | dense | 卡片网格 | "大模型靠规模涌现出通用智能" | <3-5 概念 + 一句 takeaway> | `brain`/`cpu`/… |
| 4 | dense | 时间轴 | "六年能力指数跃迁" | <里程碑 + takeaway + 来源> | — |
| 5 | **breathing** | 大字页 | "2 个月,月活破亿" | <单个大数字 + 一句语境对比> | — |
| … | … | … | … | … | … |
| N | anchor | 尾页 | 致谢 / Q&A | <联系方式> | — |
> **三条硬纪律(大纲阶段就定死)**:
> - **论断标题**:标题列写"结论"不写"主题"("渗透率破 60%" 不是 "行业背景");
> - **节奏不雷同**:相邻内容页不同版式;**每隔 2-3 页插一个 `breathing` 页**(大字/金句/整图,禁卡片网格)打破"全卡 = AI 味";**卡片网格全 deck ≤2 次**;
> - **内容→版式映射**:历程→时间轴、循环→闭环、2-4 数字→KPI 卡(带对比基准)、并列概念→均衡网格、单震撼数字→breathing 大字。
>
> 内容页正文优先压成一句 **Takeaway 结论**;含数据的页要有**对比基准 + 来源**。版式见 layouts.md §选版式速查。配图页标 `[img]` + 一句画面。
大纲连同 8 项一起给用户预览,**BLOCKING 等用户确认整份结构**(页数、每页讲什么、节奏、版式)后再进阶段二。用户在这一步推翻方向 = 改表格文字,零 slide 返工。
### 阶段二: 执行 (Executor) — 一个脚本建整 deck
方向已在阶段一的「逐页大纲」里跟用户对齐过,执行阶段就是把大纲机械落成 slide。**不逐页 run_python**(每页一轮来回烧轮数/token);整 deck 在一个脚本、一个进程内构建,坐标天然一致(`pptx_helpers` 已把画布常量统一,漂移问题已解决)。
流程:
1. **读 current spec**(按 §0 的 glob 规则拿字典序最大那份),含 8 项 + 逐页大纲;只用里面定的颜色/字体/图标/页结构,**不凭记忆发挥**。
2. **图标批量预取(全 deck 一次,不逐页)**: 把大纲里所有页需要的图标概念汇总,`glob` 两处看现成 —— 种子库 `<skill_dir>/assets/icons/`(只读)+ 本 task `<task_dir>/assets/icons/`;缺的在**一个 `run_python` 里批量** `fetch_icon.py <name> --set tabler --color C00000 --size 128 -o <task_dir>/assets/icons/...` 拉齐。**几何形状(圆点/徽章/装饰线)不算图标,走 layouts.md helper**。
3. **真实配图(opt-in,仅当大纲标了 `[img]`)**: 把标 `[img]` 的页(封面/章节/图片页)汇总,**load `imagegen` skill 走它自己的确认流程**逐张生成(每张 ¥0.22,有强制确认门,不要绕过),产物落 `<task_dir>/figures/`;build_deck 里 `add_picture(<figures 路径>)` 引用。**没标 `[img]` 的 deck 跳过这步**,图标/卡片/渐变已足够撑视觉。
4. **混合背景(opt-in)**:封面/章节想要杂志级背景时,`run_python` 调 `render_bg.py --out <task_dir>/figures/cover_bg.png --kind cover --primary <主色>`(+ section),build_deck 里 `P.add_picture_bg(slide, bg)` 铺底再叠**白色**文字。**背景图不可编辑、文字可编辑**——这是 editable 前提下的最高观感。
5. **写 `build_deck.py` 到 `<task_dir>`,一次建整 deck**: 顶部 `import pptx_helpers as P``P.new_presentation``P.set_palette(spec_path=...)`**按大纲循环每页**(每页一个小函数)→ 末尾 `prs.save`。落实**信息内功**(见 design_principles §信息设计纪律):
- **论断式标题**(写结论)+ 内容页 `P.add_takeaway(slide, "<一句话结论>")`;
- 含数据用 `P.add_kpi(..., baseline=, delta=)` + `P.add_source`;**数字别孤立**;
- **节奏**:按大纲的 anchor/dense/breathing 落版式,breathing 页走大字/金句/整图(**禁卡片网格**);
- **投影克制**:平铺网格卡用 `add_card`(默认平卡),投影只给悬浮/被挑出的卡,每页 ≤2-3 个;
- 每页 `P.add_notes` 写 2-4 句**结论先行的口语**演讲稿。
helper 一律 `P.xxx` 不默写源码;版式见 layouts.md。先 `write` 脚本再 `run_python(script_path=...)`
6. **quality_check + 预览双验收**(见阶段三)→ 按报告**改 `build_deck.py` 重跑**(不逐页 edit 成品)。
7. 报整份 deck:页数、各页版式/节奏、用到的图标/配图;问用户要不要改。
8. 用户确认了**实质改动**后,追加一行到 `<task_dir>/REVISIONS.md` —— 见 §修订日志。
**风格探针(可选,降视觉返工险)**: 用户对观感没底、或这是全新风格时,可先只建**封面 + 1 内页**给用户看一眼,确认后把 `build_deck.py` 的页范围放开重跑补齐其余页 —— 仍是改一个脚本,不退回逐页。用户要快("直接全做")就跳过探针,整 deck 一把出。
**为什么不再逐页?** 逐页的两个理由都已消解:① 防坐标漂移 → `pptx_helpers` 模块化已解决;② 早发现方向问题 → 前移到阶段一「逐页大纲」确认(改文字比改 slide 便宜),视觉观感由可选探针 + 整 deck 后批改兜底。代价是放弃"逐页即时纠错",换来 N 页从 ~2N 轮降到 ~3-4 轮。
### 阶段三: 验收 (结构 + 观感 双验)
**① 结构验收** `quality_check.py`(越界/溢出/三色/重叠):
```bash
python <skill_dir>/scripts/quality_check.py <task_dir>/<output.pptx> --spec <task_dir>/<today>-<task_short_id>-<task_name>.spec.md
```
**② 观感验收** `pptx_preview.py`(渲成 PNG **肉眼看版面**)—— quality_check 查不出"好不好看 / 文字层级 / 留白 / 多行文本掉色"这类问题,**交付前必须渲几页关键页用 `read` 亲眼过**:
```bash
python <skill_dir>/scripts/pptx_preview.py <task_dir>/<output.pptx> -o <task_dir>/preview --pages 1,3,5
```
看封面、一个内容页、breathing 页是否如预期(标题层级、卡片是否过挤/过空、文字是否都正常上色、节奏是否单调)。
两项不通过的,**改 `build_deck.py` 重跑**(改源脚本可复现;不要直接 edit 成品 .pptx)。
## 设计原则 (硬规则速查)
- **每页一个核心信息**: 一页讲一件事,塞两件就拆页
- **内容装进卡片**: 内容页主力容器是 `add_card`(圆角+柔和投影),白底之上靠卡片浮起分层,别让元素裸贴白纸
- **概念配图标底块**: 业务概念(能力/模块/策略)用 L11 卡片网格 + `add_icon_tile`,**别只摆圆点 bullet**(视觉太单薄)
- **数字上 KPI 卡**: 2-4 个关键数字用 L10 `add_kpi`,优先于硬画柱状图;单个震撼数字用 L13
- **bullet ≤ 5 条/列**: 单列超过就拆页或改卡片网格;双栏对比左右各 ≤5
- **正文不写完整段落**: 列要点;长句留给演讲者口述(写进 `add_notes`)
- **数据 ≥ 3 个点应有图表**: matplotlib 生成 .png 嵌入(或转 KPI 卡)
- **中文标题 ≤ 30 字**
- **配色三色封顶 + 派生阶**: 主 + 辅 + 强调三色系,浅底/卡片底走 `set_palette` 自动派生的 `PRIMARY_WASH/SOFT`,不算新色
- **渐变只用在大色块**: 封面/章节用 `apply_brand` 内置渐变;渐变深底上文字一律用白/`ACCENT_SOFT`
- **每页演讲者备注**: `add_notes` 写 2-4 句口述要点(正式产物标配)
- **Shape 不能越界**: helper 内置 `assert_inside` 生成时即报错
- **字数按预算来**: 写 bullet 前查 `design_principles.md §4.1` 字数预算表;卡片内按"卡宽 - 0.8"算框宽
- 详细规则见 `references/design_principles.md`
## 工作目录约定 ## 工作目录约定
下文 `<task_dir>` = system prompt 里「task_dir」给的**绝对路径**(host 下形如 `…/workspace/users/<uid>/<wd>/`,docker 沙盒里是 `/workspace/<wd>/`)。**所有产物都写到 task_dir 下**,不要写到 cwd / `skills/` / repo 根;图标分两处:skill 自带的**只读种子库**走 `<skill_dir>/assets/icons/`(docker 沙盒里 skills 只读,只读不写),`fetch_icon.py` 新拉的图标写 `<task_dir>/assets/icons/`(详见 references/icons.md §A)。 `<task_dir>` = system prompt 注入的绝对路径。**每份 deck 用一个独立 project 目录** `<project_dir> = <task_dir>/<deck_slug>/`(`deck_slug` 按主题取,多 deck 不撞)。引擎契约文件(`design_spec.md`/`spec_lock.md`)和各产物子目录都在 `<project_dir>` 下:
``` ```
<task_dir>/ <project_dir>/
├── source/ # markitdown 转出的素材(同 working_dir 多 task 共享;用 markitdown -o <task_dir>/source/<name>.md) ├── sources/ # markitdown 转出的素材
├── <today>-<task_short_id>-<task_name>.spec.md # 八条对齐落定,task 级宪法;命名见 system prompt 约定;按 short_id 主锚,重定调时写新日期,旧版保留 ├── design_spec.md # 人读:设计叙事(受众/风格/配色理由/逐页大纲)——引擎契约之一
├── slides/ # 各页 matplotlib 图表 (chart_p3.png 等),多 task 时文件名前缀区分 ├── spec_lock.md # 机读:执行锁(HEX/字体栈/图标/图片清单/page_rhythm/page_layouts)——executor 每页重读
├── figures/ # imagegen 生成的真实配图 (opt-in;封面/章节主图),由 imagegen skill 落盘 ├── images/ # 配图(imagegen 生成 / 用户提供 / 公式 PNG);SVG 里用 ../images/ 引用
├── assets/icons/ # fetch_icon.py 新拉的主题色图标(种子库在 skill 只读侧) ├── templates/ # 仅当用户给了模板路径才有(模板 SVG + 其 design_spec)
├── build_deck.py # 整 deck 构建脚本(一次建完所有页);改稿/修 quality_check 项都改它重跑 ├── icons/ # 可选:项目本地图标(没有则 finalize 回退到 skill 的 templates/icons/)
├── REVISIONS.md # 修订日志:每次卡点用户确认的实质改动,见 §修订日志 ├── svg_output/*.svg # ★ executor 逐页手写的 SVG(视觉真相、改稿对象)—— 唯一可见的 svg 目录
└── <topic>.pptx # 最终产物 (按主题命名,多 task 时主题必须不同) ├── notes/total.md # 演讲者备注(逐页),total_md_split 拆分后导出嵌入
├── exports/<slug>_<ts>.pptx # ★ 最终产物(原生 DrawingML,可编辑)
├── REVISIONS.md # 修订日志(见 §修订日志)
└── .build/ # 可再生构建产物(dotfile 隐藏、随时可删;用户文件列表看不到)
├── svg_final/ # finalize 产出(图标/配图已内嵌,自包含;供 legacy 导出 + 忠实预览)
├── preview/ # svg_preview 渲的验收 PNG
├── acceptance.json # 渲图验收记录(每页源 sha1 + verdict;导出 gate 依据)
└── backup/latest/svg_output/ # SVG 源快照(只留最新一份,可不跑模型重新导出)
``` ```
## 修订日志 (REVISIONS.md) **所有产物写 `<project_dir>` 下**,不写 cwd / `skills/` / repo 根。**可见面 = 源 + 交付物**(sources/images/svg_output/notes/exports + 两个 spec + REVISIONS);派生的中间物(svg_final/preview/backup)一律进 `.build/`,由脚本自动落位,**不要手动在根目录建 svg_final/preview/backup**。
`<task_dir>/REVISIONS.md` 是产物迭代过程的紧凑可读 changelog。**spec 是宪法(定调一次),REVISIONS 是实施日志(每次卡点累加)** —— 两份独立但互参,后期 review / 复盘 / 跨周回看"上周这页为啥改成这样"靠这份。 ## 默认主题 — 自由设计(content-driven)
### 何时记 / 何时不记 **默认不锁死配色**:策略阶段根据**内容 + 受众 + 选定的 visual_style** 派生一套协调配色与版式(在 spec 阶段给用户 ≥3 个配色/风格候选挑)。模板是地板也是天花板 —— 默认自由设计让版面跟着内容走,而非被固定语汇框死。
- 商务红 `#C00000` / 中建材等品牌色,作为**候选之一**;**中文政企/集团/科研商务汇报默认就把商务红列进 ≥3 配色候选**(见 strategist.md §e)。用户点名("做成蓝色 / 用我们公司紫色")或素材里有 brand guideline → 按其锁定。
- 现成一款 **`business-red` 商务红品牌预设**(`templates/brands/business-red/`:#C00000 全色表 + 宋体标题 + 实心图标);用户说"红色 / 商务红 / 中建材风"→ 指给他按路径 opt-in,或直接锁其配色。其它品牌/模板同理:**用户给 `templates/` 下明确路径才触发**(见 strategist.md 模板分发),不主动猜、不模糊匹配。
- **例外(主动提示):中国建材总院系汇报** —— 受众/素材/用户机构指向 **中国建筑材料科学研究总院 · 中国建材(CNBM)系**(工作汇报/立项/项目评审/**职称评审**/品牌宣讲)时,策略阶段**主动**把整套品牌模板 `templates/layouts/zongyuan_red/`(八边形 logo + 品牌红 `#D7000E` + 总部大楼实景铺底,5 页齐)作为候选点名给用户,用户点头再按明确路径套入(见 strategist.md §e "中国建材总院" 提示)。这是唯一鼓励主动提模板的场景;其余仍等明确路径。
---
## 阶段一:策略(Strategist)—— 八条对齐 + 逐页大纲,产出 spec
**先读** `references/strategist.md`(取其设计判断)+ `templates/design_spec_reference.md` + `templates/spec_lock_reference.md`(产出骨架)。
**0. 先检测已有 spec**:`glob <task_dir>/*/spec_lock.md`。
- 当前 task 已有 project → 展示给用户,问「**沿用进阶段二** / **重定调**(新建 project 目录,旧的保留)」,⛔ BLOCKING 等决定。
- 没有 → 走下面。
**八条对齐(ah)**——按下表**一次性给推荐方案**(默认自由设计),然后 ⛔ **BLOCKING:等用户确认/修改**。不要一条条问。zcbot 走**聊天确认**(不开浏览器 Confirm UI),内容与 strategist.md 的 ah 一致:
| # | 项 | 默认 |
|---|----|------|
| a | 画布 | **16:9**(viewBox `0 0 1280 720`)。其它见 canvas-formats.md |
| b | 页数 | **独立拍板项(见下方「页数 gate」)**:按内容量 × 投递目的推**一个具体数字**(如「建议 10 页」),不甩「常 815」这种区间就想过;**封面 + 正文 + 尾页** |
| c | 受众 + 核心信息 + 投递目的 | 看材料推断受众;投递目的 `text`(读)/`balanced`(商务,默认)/`presentation`(演讲)定正文字号与密度 |
| d | mode + visual_style | mode 选 5 骨架之一;**visual_style 给 ≥3 个候选**(safe/shifted/bold)让用户挑 —— 这是观感主轴 |
| e | 配色 | 按 visual_style + 内容**派生 ≥3 套候选**(每套含 bg/primary/accent/text…);自由设计默认 |
| f | 图标 | 选 1 个库(tabler-outline 等),stroke 库要定 stroke_width;**锁 inventory 前 `ls templates/icons/<lib>/|grep` 验名** |
| g | 字体 + 字号 | CJK+Latin 字体栈(栈尾必须是预装字体,见 shared-standards §字体);正文字号按投递目的一个定值;公式策略 mixed/render-all/text-only |
| h | 配图 | `none`/`ai`(走 imagegen skill)/`provided`/`placeholder`;ai 要定 image_rendering + image_palette(deck 级锁)。**用户没给图时别默认整本 none**:封面/分节/概念/氛围页主动把 `ai` 配图作为候选提给用户(数据/列表/流程页仍走图表→§VII,不配装饰图);提议免费,只有用户确认后 imagegen 才花钱(成本门见阶段二)。见 strategist.md §h |
> 🔒 **页数 gate(不可默认放行)**:页数是**唯一必须拿到用户明确数字**才能往下走的项。给完 ah 推荐后,若用户只回笼统的「可以 / OK / 你定」而**没给出、也没逐字认可一个具体张数**,⛔ **必须单独再追问一句「这份就定 N 页,可以吗?」** —— 拿到明确整数(用户报的数,或对你推荐数的显式点头)后,才用这个数去写逐页大纲。**禁止**把区间中位数(如 ~12)当默认值自行敲定、绕过用户。**唯一例外**:用户明确说「页数你随意 / 不重要 / 你定就行」时,按你的推荐数走、不再追问(但仍要在预览里写出这个数,让用户有机会否掉)。逐页大纲的页数 = 已确认的这个数,一页不多一页不少(封面 + 正文 + 尾页含在内)。
**逐页大纲**(写进 design_spec.md §IX,也是 spec_lock 的 page_rhythm/page_layouts 依据):**论断式标题 + 每页标节奏**(`anchor`/`dense`/`breathing`)。三条硬纪律(大纲阶段定死):
- **论断标题**:写结论不写主题("渗透率破 60%" 不是 "行业背景");
- **节奏不雷同(整本 ≤2 次)**:相邻内容页不同版式,且**同一版式原型全 deck 最多 2 页**(图标卡网格 / 全宽横条列表 / **两栏裸文字列表**(图标小标题+下划线+文字堆 ×2、零图形 —— 一次真实交付里出现了 4 页)尤其;5 页"2×3 图标卡"哪怕文案不同也读作同一张片重复,真实翻车过);第 3 页起换形态(时间轴/分层/象限/流程/hub-spoke/图表)。narrative 真正停顿处插 `breathing`(单概念/金句/大图,**禁多卡网格**);不要为凑节奏造填充页;素材含 ≥3 组可比数值(规模/占比/趋势/阶段目标)→ **全本至少 1-2 页真数据图表**(bar/line/donut/进度条),大字 KPI 是强调不算图表,零数据图表要在 spec 写明理由;
- **内容→版式映射(必须落到 spec,不能整本留空)**:历程→时间轴、循环→闭环、2-4 数字→KPI、并列→网格、单震撼数字→breathing 大字、≥3 数据点→图表(charts/ 模板或自绘);对比→象限/分栏、流程→process_flow、占比→donut、架构→分层、关系→hub_spoke。**标题语义必须被图形兑现**:标题写"架构"就画层块堆叠(不是等宽横条列表)、写"矩阵"就画真象限(不是卡片网格)、写"流程/层级"就有方向/层次 —— "五层架构"画成五条一样的横条是典型名不副实。每个能结构化的内容页都要在 spec_lock 的 `page_charts`/`page_layouts` 落一个视觉处理 —— **内容 deck 不许 page_charts + page_layouts 同时空着**(=啥图都没分配,执行层必堆文字方块)。视觉下限见 strategist.md「GATE — visual floor」;质检会硬卡"全是文字方块"的扁平 deck(见阶段四)。
大纲连同 ah **一起给用户预览,⛔ BLOCKING 等确认整份结构**后再进阶段二(改文字比改 slide 便宜)。
**确认后产出两份引擎契约**(按骨架填,**只填实际用到的行**):
- `<project_dir>/design_spec.md` —— 人读叙事(IXI 节,见 design_spec_reference.md)
- `<project_dir>/spec_lock.md` —— 机读执行锁(canvas/**layout_grid**/mode/visual_style/colors/typography/icons/images/page_rhythm/page_layouts/page_charts/forbidden,见 spec_lock_reference.md)。**executor 每页重读它**,是长 deck 抗漂移的命门。`layout_grid`(margin_x/content_top/footer_y/gutter)是跨页对齐的锚 —— 手写绝对坐标没有锁定基线必漂,质检会硬卡偏离网格 215px 的"想对齐没对齐"。
> 公式策略 mixed/render-all 且有公式 → 写 `images/formula_manifest.json` 后渲染(ppt-master 的 latex_render 未搬;zcbot 可用现有公式渲染或转图后按 `images` 行登记)。
## 阶段二:配图(条件触发)
**仅当 spec §VIII 有 `ai` 行**:把要 AI 生成的配图汇总,**load `imagegen` skill 走它自己的成本确认流**逐张生成(有强制确认门,不要绕过),产物落 `<project_dir>/images/`。`web`/`provided`/`placeholder`/`none` → 跳过本阶段。
> ppt-master 自带的 image_gen.py / image_search.py 配图子系统**未搬**;zcbot 统一走 imagegen skill。spec 的 §VIII 图片清单格式照用,只是获取机制不同。
## 阶段三:执行(Executor)—— 逐页手写 SVG
**先读**(按本 deck spec_lock 锁定值):
```
references/executor-base.md # 执行通则
references/shared-standards.md # SVG/PPT 硬约束
references/modes/<locked-mode>.md # 锁定的叙事骨架
references/visual-styles/<locked-style>.md # 锁定的视觉风格
```
只读锁定的那一个 mode + 一个 visual-style,别 glob 整个目录。
**纪律(来自 SKILL 全局 + executor-base,务必遵守)**:
1. **逐页串行手写,不批量、不脚本生成**:每页由当前主 agent 在同一上下文里手写 SVG;**禁止写循环脚本批量产 SVG**(跨页视觉一致性靠逐页带上游上下文,生成器做不到),也不要 5 页一组。
2. **每页前重读 `spec_lock.md`**:颜色/字体/图标/图片只能来自它;查本页 `page_rhythm`/`page_layouts`/`page_charts`;坐标吸附 `layout_grid`(左缘=margin_x、正文顶=content_top、并排卡片同 top 同高等 gutter,打破网格要 ≥16px 干净地打破,不许差几 px 的"差不多" —— 对齐纪律详见 executor-base §3)。抗上下文压缩漂移。
3. **模板供结构不供皮**(非 mirror):继承几何/标签位置/编码逻辑,**重新上 visual_style + spec_lock.colors 的皮**;字号按 spec_lock 角色锁定值,不继承模板占位字号。
4. **图标(锁了就必须用,非可选装饰)**:spec_lock 有 `icons.library` + 非空 `inventory` 时,**每个内容页必须放 13 个 inventory 内的图标**(KPI/列表/流程/对比/特性网格版式尤其要,常一卡一图标)——自由设计没有模板可继承图标,只能逐页手写 `<use data-icon>` 才有图标。封面/纯排版分节页/单数字·金句 breathing 页/尾页可不放。写法:`<use data-icon="<lib>/<name>" x= y= width= height= fill= [stroke-width=]>`,name 必须在 inventory 内、文件在 `templates/icons/<lib>/`。**质检会硬卡**:锁了 inventory 但全 deck 0 图标 → error 退非零(见阶段四)。
5. **配图**:`<image href="../images/<file>">`,croppable 用 `preserveAspectRatio="xMidYMid slice"`,`| no-crop` 行用 `meet`;意图与版式见 image-layout-*。
逐页写到 `<project_dir>/svg_output/<NN>_<page>.svg`。**演讲者备注**写 `<project_dir>/notes/total.md`(每页 24 句结论先行口语)。
## 阶段四:SVG 质检(强制门)
```
.venv/Scripts/python.exe <skill_dir>/scripts/svg_quality_checker.py <project_dir>
```
- **任何 `error`(禁用特性 / viewBox 不符 / spec_lock 漂移 / **图标压在文字上、文字基线超出画布、CJK 文字互相叠压**(Geometry 检测,几何精确)/ **兄弟卡片错位 212px、偏离 layout_grid 网格、正文越过 content_bottom 侵入页脚区、spec 指派了 page_charts 该页却零图形(图表被退化成文字)**(Alignment 检测,几何精确)/ **锁了图标 inventory 却全 deck 0 图标** / **内容 deck 全是文字方块(≥6 页且零 `<path>`/`<polygon>`/`<polyline>`/`<image>`)** / **≥4 页同版式指纹(单调门,含两栏裸文字列表)** 等)必须改:回阶段三重写该页再跑**,不放过。
- `warning`(低分辨率图 / 非 PPT 安全字体等):能顺手改就改,否则知会后放行。**例外:`Geometry:` 开头的文字重叠 warning 不许无视** —— 它给了精确坐标,是"大字压说明 / 同行文字互侵"的高嫌疑点(估宽无法区分擦边与压字,所以只报 warn),阶段五渲图时**必须对着该页该坐标专门看**,压了就返工。
- 跑 `svg_output/`(不要在 finalize 后跑 —— finalize 改写 SVG 会掩盖源级违规)。
- ⚠️ **别用 `| head` / `| tail` 截断质检输出**:管道会把脚本的非零退出码换成 `head` 的 0(门形同虚设),`head` 还会截掉打在**最后**的 deck 级门结论(如零图标 `[ERROR]`)。原样跑,读完整输出、认它的退出码。
- 跳过本阶段没有意义:导出边界会**自动复跑同一套逐页硬错误检查**(见阶段六质检门),error 到那里一样拒绝导出 —— 在这里主动跑并连警告一起读,能更早返工。
## 阶段五:后处理 + 渲图验收(强制门)—— 全量,不抽查
⚠️ 三步**一步步来**,别合并成一条命令:
```
# 5.1 SVG 后处理(图标/配图内嵌 / 文本展平 / 圆角转 path)
.venv/Scripts/python.exe <skill_dir>/scripts/finalize_svg.py <project_dir>
# 5.2 全量渲图(渲 .build/svg_final,同步登记 .build/acceptance.json 验收记录)
.venv/Scripts/python.exe <skill_dir>/scripts/svg_preview.py <project_dir>
# 5.3 read/look_at_image 逐页过目后,标记验收结论
.venv/Scripts/python.exe <skill_dir>/scripts/accept_pages.py <project_dir> --pass-all
# (有问题的页:--fail <页名> --reason "…";只标部分页:--pass <页名>;看状态:--status
```
- **默认渲整本,不带 `--pages`**。抽查 3 页只能覆盖 3 页,错位/文字溢出/元素重叠恰恰藏在没看的那些页里 —— 逐页手写绝对坐标,每页都可能翻车,所以**每页都要过目**。(页数多时可分批渲,但目标是 100% 覆盖,不是采样。)
- `read` / `look_at_image` **逐页**亲眼过:标题层级、卡片过挤/过空、**文字是否溢出卡片/被裁**、**元素是否重叠错位**、**并排元素顶/底是否对齐、与上一页对比左缘/内容顶线是否一致**(跨页一致性只有连续翻看才看得出)、图标在不在(位置对不对)、节奏是否单调(连续几页同为卡片墙就该返工换形态)、配图位置。**看完才许标 pass** —— `--pass-all` 是"每页都看过且都合格"的宣告,不是跳过看的快捷键。
- 🚧 **差评即阻断 + 返工回路**:任一页有排版/溢出/重叠/半成品问题(哪怕只是封面)→ **改那一页 svg_output 的 SVG → 重跑 finalize → `svg_preview.py <project_dir> --pages <N>` 重渲该页 → 复看 → 再标 pass**。机制会强制这个回路:标 pass 和导出 gate 都校验"渲图之后源文件没再改过"(sha1),改了不重渲重看,gate 过不去。不许"看了一页差评、跳去看下一页好评就收尾"——那正是错位交付的来路。
- ❌ **禁止盲改**:修错位/补图标不许写脚本批量 regex 插元素、改完不看渲染结果(真实事故来源:质检提示缺图标后 regex 批量盲插,图标全压在文字上交付)。每处修改都要走上面的返工回路落到"复看"。
> svg_preview 渲的是 SVG(视觉真相,与导出的 pptx 1:1),比渲最终 pptx 更早更准暴露观感问题。需要校验"SVG→DrawingML 转换是否保真",再开导出的 pptx 在 PowerPoint 里看。
## 阶段六:导出
```
# 6.1 拆备注
.venv/Scripts/python.exe <skill_dir>/scripts/total_md_split.py <project_dir>
# 6.2 导出原生 PPTX(默认嵌备注 + Office 兼容 PNG 兜底)
.venv/Scripts/python.exe <skill_dir>/scripts/svg_to_pptx.py <project_dir>
# 产物:exports/<slug>_<ts>.pptx(原生,读 svg_output/)+ .build/backup/latest/svg_output/(源快照,只留最新)
```
- 🚧 **导出边界质检门(硬,无豁免参数)**:导出前自动复跑阶段四质检的逐页硬错误(禁用特性 / 坏 XML / 图片文件缺失 / 图标压字·出画布几何错误等),**有 error 直接拒绝导出**。没有任何 `--allow-*` 能绕过 —— 这些是真缺陷,回 svg_output 修完再来。
- 🚧 **导出边界验收门(硬)**:spec_lock 存在时,**每页都必须渲过图(svg_preview)、且渲图后源未再改动、且 verdict=pass**。分两层:**"从没渲过 / 渲后又改 / finalize 前渲的"没有任何 CLI 逃生口**(渲图很便宜,没有理由交付一页没人看过的东西);`--allow-unreviewed` 只豁免"渲过但还没标 pass"这一层,**不是跳过验收的捷径**。被拒就回阶段五补验收/走返工回路。
- 🚧 **导出边界图标门(硬)**:spec_lock 锁了 `icons.library` + 非空 `inventory``svg_output/` 全 deck 零 `<use data-icon>` → 同样 `[ERROR]` 退非零(检测永远对 svg_output 源,与 `-s` 无关)。正确做法是回阶段三给内容页补图标重跑;只有 lock 确实过期 / 有意做无图标 deck 才加 `--allow-iconless` 放行。
- ❌ **别加 `-s final`**:native 导出默认读 `svg_output/`(转换器自己处理图标占位与 `../images/` 相对路径),`-s final` 只会引出图片路径错位这类连锁问题;真实事故里模型为绕它把 svg_output 源里的 href 改坏了。
- 🛑 **导出唯一入口 = 官方 `svg_to_pptx.py`,严禁自写导出器**:它**默认产出原生可编辑 DrawingML**(形状/文本/渐变都能在 PowerPoint 里选中改),是**纯 Python、不依赖任何外部渲染器**(cairosvg / inkscape / rsvg-convert 一个都不需要)。所以**"某某渲染器没装"永远不是理由**——别 `pip install cairosvg` 也别手搓"SVG→PNG→整页贴图"的 `export_pptx.py`。自搓光栅导出器 = 整份变成一叠不可编辑的贴图(每页一张整页 PNG、零原生文本),**skill 核心价值直接归零、判废**。官方脚本跑不动就读它的报错按流程修 / 反馈,不要另起平行管线。
- ❌ 别用 `cp` 代替 finalize_svg(它做了多步关键处理);❌ 别加 `--only` / 强制 `-s output`
- 动画可选:`-t fade`(翻页,默认)/ `-a auto`(逐元素入场,**默认 none**,用户要才开)。全表见 animations.md。
- 改稿:只改 `spec_lock.md` 的颜色/字体 → `update_spec.py <project_dir>` 传播到所有 SVG(所有页源都变了 → **重跑阶段五全量重渲重标**,顺手把全本再过一遍眼);改版式/内容 → 重写对应页 SVG 再走阶段五返工回路 + 6.2,**不要直接 edit 成品 .pptx**。
完成后:用 `update_spec` / 重写页迭代;用户确认**实质改动**后追加一行到 `REVISIONS.md`
## 修订日志(REVISIONS.md)
`<project_dir>/REVISIONS.md` 是迭代 changelog。**spec 是宪法(定调一次),REVISIONS 是实施日志(每次卡点累加)**。
| 情形 | 记? | | 情形 | 记? |
|---|---| |---|---|
| 用户确认改**版式 / 主色 / 字体方向** | ✅ 必记 | | 用户确认改**版式/主色/字体/mode/visual_style 方向** | ✅ |
| 用户确认换 / 增 / 删**页 / 关键图标 / 数据图表** | ✅ 必记 | | 用户确认换/增/删**页/关键图标/数据图表** | ✅ |
| 用户确认改**文案要点 / 核心信息 / 受众定位** | ✅ 必记 | | 用户确认改**文案要点/核心信息/受众定位** | ✅ |
| 自查阶段发现版式越界 / 颜色不一致后的修正 | ✅ 必记(说明触发 quality_check 项) | | 自查发现越界/不一致后的修正 | ✅(注明触发的 quality_check 项) |
| 页首次起草(从 0 加出来) | ❌ 不记(初稿不是改动) | | 页首次起草 / 字号间距微调 / 模型自己改撤未经确认 | ❌ |
| 字号 / 间距 / 对齐微调 | ❌ 不记 |
| 模型自己改改撤撤、用户没明确确认 | ❌ 不记 |
> 拿不准 → 倾向不记。`REVISIONS.md` 是"用户与 LLM 共同沉淀的实质决策",不是流水账(那是对话历史的事)。
### 格式
文件首次创建时写头(只写一次):
```markdown
# 修订日志
> 产物迭代过程中每次用户确认的实质改动,按时间倒序追加(最新在上)。spec 是宪法定调,本文件是实施日志。
```
每次记一笔追加在头注释之后、最新一笔的顶部(一行 = 一次改动):
格式(倒序,最新在上,插在头注释之后):
``` ```
- `<YYYY-MM-DD HH:MM>` | < N / spec §X> | <一句话改了什么><为什么> - `<YYYY-MM-DD HH:MM>` | < N / spec §X> | <一句话改了什么><为什么>
``` ```
### 实例
```
- `2026-03-12 16:20` | 第 5 页 | 版式从 layouts.md "两栏文+图"改为"单栏图占主体" — 用户反馈原版式右侧文字太挤,核心数据需放大
- `2026-03-12 14:05` | 第 3 页 | 删 chart 图,换成 3 个 KPI 数字块 — 数据点只有 3 个,bar chart 浪费版面
- `2026-03-11 10:30` | spec §5 配色 | 主色 `#C00000``#1F4E79` — 用户给的品牌指南要求蓝色,商务红默认被覆盖
```
### 操作
每次卡点用户确认后,用 `edit` 在头注释之后插入新一行(不要 append 到文件末尾 —— 倒序读才能秒看最新)。文件不存在就 `write` 创建带头注释的新文件。
## 反模式 ## 反模式
- 用户没给材料就开始硬编内容 - 用户没给材料就硬编内容(没材料只给主题 → 先补素材/反问,别凭空发挥)
- 八条没对齐就跑 python-pptx - 八条没对齐、没产出 spec_lock 就开始写 SVG
- **基于"场景判断"自行换配色**(见上"默认主题"违规清单) - **写脚本批量生成 SVG**(破坏跨页一致性,禁;逐页手写)
- **缺封面 / 缺尾页(Q&A)** —— 两端都是强制项,不算在正文页数预算内 - **绕开官方管线、自搓 SVG→PPTX 导出器**(`pip install cairosvg`/`inkscape` + 手写 `export_pptx.py` 把每页渲成 PNG 整页贴进幻灯片)—— 产物变一叠**不可编辑的整页贴图**(零原生文本/形状、还发虚、外链配图丢失),skill 全部价值作废。官方 `svg_to_pptx.py` 默认就是原生可编辑、纯 Python 无需外部渲染器,**"渲染器没装"不是造轮子的借口**;导出/后处理/质检/验收**只走 §16 资源里那几个官方脚本**,缺一步就补一步,别另起平行流程
- **裸白纸版式** —— 所有版式起手都必须 `apply_brand(slide, kind)`,见 layouts.md - **执行时不每页重读 spec_lock**(长 deck 必漂色/漂字号)
- **业务概念页只用几何形状 / 裸圆点 bullet** —— "战略目标 / 三大能力"这类页摆光圆点没图标没卡片,视觉太单薄;用 L11 卡片网格 + `add_icon_tile`,图标按 §阶段二第 2 步先拉 - **同 deck 混用多个图标库** / 用 inventory 外的图标名
- **数字页硬画柱图** —— 只有 2-4 个数字却画 bar chart 浪费版面,用 L10 KPI 卡 - 用了 `<style>`/`class`/`<mask>`/`<symbol>+<use>`/`@font-face`/`rgba()`/HTML 命名实体 等 **shared-standards 禁用特性**(导出会丢元素或报错)
- **元素裸贴白纸不进卡片** —— 内容页一坨文字/图标直接铺白底,显扁平;装进 `add_card`(自带投影)分层 - 字体栈尾不是预装字体(PPTX 无运行时回退,会变默认字体)
- **演讲者备注全空** —— 正式产物每页应有口述要点,`add_notes` 顺手写,别交白板 - **breathing 页堆多卡网格**(违节奏,显 AI 味)
- **逐页 run_python 建 deck**(每页一轮来回烧轮数;改用一个 `build_deck.py` 整建,方向风险靠阶段一大纲 + 可选探针兜) - 模板照搬不重上皮(直接用模板默认渐变/阴影/字号)
- **没经阶段一大纲对齐就直接整建** —— 大纲是替代逐页确认的 checkpoint,跳过它整建才会"改方向全推翻" - 质检没过就交付 / 直接 edit 成品 .pptx 改稿
- 跑完不做 `quality_check.py` 就交付 - **只渲/只看几页就收尾**(错位藏在没看的页里);**看到差评却不返工**(封面 vision 说"半成品/挤左侧"还继续导出交付);**没看 PNG 就 `accept_pages --pass-all`**(把验收门当橡皮图章 —— gate 只能强制"渲过、源没改",看没看只有你自己知道,糊弄的结果就是错位 deck 交到用户手上)
- 起名 `output.pptx` / `untitled.pptx` —— 务必按主题给文件名 - **质检/渲图后为消警告写脚本批量盲插元素**(regex 批量加图标、改坐标,改完不复看渲染 —— 真实事故:25 页 deck 图标全压在文字上交付)
- **用 `| head` 截断质检或导出输出**(吞非零退出码 + 截掉最后的门结论,门形同虚设)
- 起名 `output.pptx` —— 按主题命名
## 输出 ## 输出
完成后告诉用户:文件路径、页数、用到的版式列表、是否有未满足的 spec 项。问一句要不要再改。 完成后告诉用户:文件路径、页数、用到的 mode + visual_style + 版式列表、是否有未满足的 spec 项。问一句要不要再改。
---
> 本 skill 的 SVG→PPTX 引擎、references 设计知识、templates 模板/图标库移植自开源项目 **ppt-master**(github.com/hugohe3/ppt-master,MIT License),适配 zcbot 的 task_dir / 聊天确认 / imagegen 工作流;浏览器 Confirm UI、live preview server、TTS 配音等桌面交互件未移植。

View File

@ -1,66 +0,0 @@
# 本地图标库
> 这里是 skill 自带的**只读种子图标库**,**已入库一组商务红 tabler 种子集**(target / brain / chart-bar / users / trophy / alert-triangle / cpu / building-factory / cloud-network / database 等),覆盖大部分商务汇报场景 —— 直接 `glob` 读用即可。docker 沙盒里 `skills/` 是只读挂载,**不能往这儿写**。新场景按需 `fetch_icon.py` 拉,落点是 `<task_dir>/assets/icons/`(可写),本 task 内再用直接读不发请求。
## 缓存命名规约
```
<set>_<name>_<colorhex>_<sizepx>.png
<set>_<name>_<colorhex>.svg
```
例: `tabler_rocket_C00000_128.png` / `lucide_target_FFC107_96.svg`
## 推荐图标清单 (按业务主题)
种子集已含下列大部分;若某个本 task 缺,按下面命令拉到 `<task_dir>/assets/icons/`(种子库只读,新图标进 task 目录):
```bash
ICONS_DIR=<task_dir>/assets/icons # 可写落点;<skill_dir>/scripts 来自 load_skill 头(只读可执行)
# 战略 / 目标 / 启动
for n in target rocket flag bulb; do
python <skill_dir>/scripts/fetch_icon.py $n --set tabler --color C00000 --size 128 \
-o "$ICONS_DIR/tabler_${n}_C00000_128.png"
done
# 数据 / 趋势 / 报表
for n in chart-bar chart-line trending-up calculator; do
python <skill_dir>/scripts/fetch_icon.py $n --set tabler --color C00000 --size 128 \
-o "$ICONS_DIR/tabler_${n}_C00000_128.png"
done
# 团队 / 流程 / 时间
for n in users settings calendar clock check shield-check arrow-right alert-triangle currency-yuan circle-check; do
python <skill_dir>/scripts/fetch_icon.py $n --set tabler --color C00000 --size 128 \
-o "$ICONS_DIR/tabler_${n}_C00000_128.png"
done
```
## 图标集对照
| 集名 | 风格 | 数量 | License |
|-----|-----|-----|---------|
| **tabler** ⭐ 推荐 | 描边、商务、克制 | 4500+ | MIT |
| lucide | 描边、克制 | 1500+ | ISC |
| heroicons | Tailwind 风、双重粗细 | 300+ | MIT |
| material-symbols | Google Material 描边/填充 | 3000+ | Apache 2.0 |
| carbon | IBM、克制专业 | 2000+ | Apache 2.0 |
| fluent | Microsoft、温和现代 | 4000+ | MIT |
| mdi | Material Design Icons 社区 | 7000+ | Apache 2.0 |
## 浏览找名字
打开 https://icon-sets.iconify.design/ 搜中英文关键词,复制图标名 (如 `tabler:rocket`),回来用 `--set tabler rocket` 拉。
## 主题色变体
同一图标按主色/辅色/强调色/灰各拉一份,文件名只在 `<colorhex>` 段不同:
- `tabler_target_C00000_128.png` (主红)
- `tabler_target_E15554_128.png` (辅红)
- `tabler_target_FFC107_128.png` (强调金)
- `tabler_target_595959_128.png` (灰)
## 用图标的硬规则
`references/icons.md §C` —— 风格统一、颜色限定、大小克制、不替表意、避 emoji。

View File

@ -1 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="128" height="128" viewBox="0 0 24 24"><path fill="none" stroke="#C00000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2" d="M12 9v4m-1.637-9.409L2.257 17.125a1.914 1.914 0 0 0 1.636 2.871h16.214a1.914 1.914 0 0 0 1.636-2.87L13.637 3.59a1.914 1.914 0 0 0-3.274 0M12 16h.01"/></svg>

Before

Width:  |  Height:  |  Size: 343 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.6 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 4.3 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.9 KiB

View File

@ -1 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="128" height="128" viewBox="0 0 24 24"><g fill="none" stroke="#C00000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"><path d="M5 6a1 1 0 0 1 1-1h12a1 1 0 0 1 1 1v12a1 1 0 0 1-1 1H6a1 1 0 0 1-1-1z"/><path d="M9 9h6v6H9zm-6 1h2m-2 4h2m5-11v2m4-2v2m7 5h-2m2 4h-2m-5 7v-2m-4 2v-2"/></g></svg>

Before

Width:  |  Height:  |  Size: 352 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.1 KiB

View File

@ -1 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="128" height="128" viewBox="0 0 24 24"><g fill="none" stroke="#C00000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"><path d="M4 6a8 3 0 1 0 16 0A8 3 0 1 0 4 6"/><path d="M4 6v6a8 3 0 0 0 16 0V6"/><path d="M4 12v6a8 3 0 0 0 16 0v-6"/></g></svg>

Before

Width:  |  Height:  |  Size: 308 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.8 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.3 KiB

View File

@ -1 +0,0 @@
<svg xmlns="http://www.w3.org/2000/svg" width="128" height="128" viewBox="0 0 24 24"><g fill="none" stroke="#C00000" stroke-linecap="round" stroke-linejoin="round" stroke-width="2"><path d="M11 12a1 1 0 1 0 2 0a1 1 0 1 0-2 0"/><path d="M7 12a5 5 0 1 0 10 0a5 5 0 1 0-10 0"/><path d="M3 12a9 9 0 1 0 18 0a9 9 0 1 0-18 0"/></g></svg>

Before

Width:  |  Height:  |  Size: 331 B

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.8 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.9 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.6 KiB

View File

@ -0,0 +1,163 @@
# Page Transitions & Per-Element Animations
PPT Master's exported PPTX supports **page transitions** (slide-to-slide) and **per-element entrance animations** (within a slide). Both are controlled by `svg_to_pptx.py` CLI flags and ship as real OOXML — they animate inside PowerPoint and Keynote, no embedded video.
## Defaults
| Layer | Default | Why |
|---|---|---|
| Page transition | `fade`, 0.4s | Calm baseline that suits most decks |
| Per-element animation | **`none` (off)** | A page appears as a whole. Auto-firing element builds are an unsolicited "AI deck" tell, so element entrance is opt-in. Turn it on with `-a auto` (or another effect): effects map from group id (chart→wipe, card-/step-/pillar-→fly, title/takeaway→fade); image-like ids (`hero` / `figure-` / `image` / `img-` / `kpi`) cycle a richer visual pool (zoom / dissolve / circle / box / diamond / wheel) so multiple images vary across the deck; unmatched ids cycle a small fade/wipe/fly/zoom pool |
To regenerate a deck with different settings, rerun `svg_to_pptx.py` against the same `svg_output/` (or `svg_final/`) — no need to rerun the LLM. To turn per-element animation on for the whole deck, pass `-a auto`.
## Custom Object-Level Animation
Per-element animation is off by default. To enable it deck-wide, pass `-a auto` at export (no config needed). When a deck instead needs specific object timing — for example title first, chart second, annotation last — use the optional `animations.json` sidecar. The SVG remains static visual source; the sidecar only controls PPTX export behavior.
Run the standalone [`customize-animations`](../workflows/customize-animations.md) workflow when the user asks to tune animation order, effects, timing, or object-level reveals.
```bash
# Build an editable scaffold from real top-level <g id> anchors
python3 skills/ppt/scripts/animation_config.py scaffold <project>
# Validate references before export
python3 skills/ppt/scripts/animation_config.py validate <project>
# Export reads <project>/animations.json automatically when present
python3 skills/ppt/scripts/svg_to_pptx.py <project>
```
Minimal sidecar:
```json
{
"version": 1,
"slides": {
"03_market": {
"groups": {
"title": { "effect": "fade", "order": 1 },
"chart": { "effect": "wipe", "order": 2, "duration": 0.6 },
"insight": { "effect": "fly", "order": 3, "delay": 0.2 },
"footer": { "effect": "none" }
}
}
}
}
```
Rules:
- `slides` keys match SVG stems (`03_market.svg` → `03_market`).
- `groups` keys match top-level `<g id="...">` anchors.
- `effect: none` removes that group from the entrance sequence.
- `order` changes animation order only; it does not change slide layering.
- `delay` is seconds before that group starts in `after-previous` mode.
- `duration` overrides the per-group entrance duration.
- `--animation none` overrides the sidecar and disables all per-element animation.
## Page Transitions
```bash
# Pick a different effect
python3 skills/ppt/scripts/svg_to_pptx.py <project> -t push --transition-duration 0.6
# Disable
python3 skills/ppt/scripts/svg_to_pptx.py <project> -t none
# Auto-advance every 5 seconds (kiosk-style playback)
python3 skills/ppt/scripts/svg_to_pptx.py <project> --auto-advance 5
```
Available effects: `fade`, `push`, `wipe`, `split`, `strips`, `cover`, `random`.
Flags:
- `-t/--transition` — effect name, or `none` to disable. Default: `fade`.
- `--transition-duration` — seconds, default `0.4`.
- `--auto-advance` — seconds; omit for presenter-controlled advance.
## Per-Element Animations
Off by default — enable deck-wide with `-a auto` (or another effect). Once enabled, three Start modes are available — these mirror PowerPoint's animation-pane "Start" dropdown:
- **`on-click`** — entering a slide → first click reveals the first semantic group; each subsequent click reveals the next group in z-order. Suits live presentations where the speaker paces reveals. Forbidden with `--recorded-narration` because video-ready exports need click-free playback.
- **`with-previous`** — all groups start together on slide entry, playing their entrance animation in parallel. Stagger ignored.
- **`after-previous`** (default) — first group fires on slide entry, subsequent groups cascade after the previous one finishes, with `--animation-stagger` extra spacing. Suits kiosk playback, recorded walkthroughs, or anyone who wants visual flow without clicking.
```bash
# Default behavior (no flags): page transitions only, no per-element builds
python3 skills/ppt/scripts/svg_to_pptx.py <project>
# Enable per-element animation deck-wide (auto effect + after-previous cascade)
python3 skills/ppt/scripts/svg_to_pptx.py <project> -a auto
# Enable with a single effect (cascades via the after-previous trigger)
python3 skills/ppt/scripts/svg_to_pptx.py <project> --animation fade
# Enable and switch to on-click for live presentations (presenter controls pacing)
python3 skills/ppt/scripts/svg_to_pptx.py <project> -a auto --animation-trigger on-click
# Custom pacing
python3 skills/ppt/scripts/svg_to_pptx.py <project> --animation mixed \
--animation-stagger 0.7 --animation-duration 0.5
# All groups animate in unison on slide entry
python3 skills/ppt/scripts/svg_to_pptx.py <project> --animation-trigger with-previous
```
22 single effects: `appear`, `fade`, `fly`, `cut`, `zoom`, `wipe`, `split`, `blinds`, `checkerboard`, `dissolve`, `random_bars`, `peek`, `wheel`, `box`, `circle`, `diamond`, `plus`, `strips`, `wedge`, `stretch`, `expand`, `swivel`. Plus three auto-vary modes:
- `auto` (recommended when enabling) — map effect from the group's SVG id. Information-dense elements get a single stable effect: `chart` / `table` / `legend` / `timeline` / `track``wipe`; `card-*` / `pillar-*` / `item-*` / `step-*` / `stage-*` / `tier-*` / `principle-*``fly`; `title` / `chapter-*` / `section-*` / `cover-*` / `tagline` / `subtitle``fade`; `takeaway` / `callout` / `quote` / `source` / `conclusion` / `note``fade`. Image-like ids `hero` / `figure-*` / `image` / `img-*` / `kpi` instead cycle a richer visual pool (`zoom` / `dissolve` / `circle` / `box` / `diamond` / `wheel`) so multiple images vary across the deck. Unmatched ids cycle through `fade` / `wipe` / `fly` / `zoom`.
- `mixed` (legacy) — deterministic. The first animated group on each slide uses `fade`; later groups cycle through a 16-effect pool (`blinds` / `checkerboard` / `dissolve` / `fly` / `cut` / `random_bars` / `box` / `split` / `strips` / `wedge` / `wheel` / `wipe` / `expand` / `fade` / `swivel` / `zoom`) across the deck. Kept for backward compatibility.
- `random` — samples from the legacy 16-effect pool.
`appear` is excluded from every variation pool because it has no visible motion.
Flags:
- `-a/--animation` — effect name, `auto`, `mixed`, `random`, or `none`. Default: `none` (per-element animation off; pass `auto` to enable).
- `--animation-trigger` — Start mode (matches PowerPoint): `on-click`, `with-previous`, or `after-previous` (default).
- `--animation-duration` — per-element entrance seconds, default `0.4`.
- `--animation-stagger` — gap between elements in `after-previous` mode (seconds, default `0.5`). Ignored otherwise.
- `--animation-config` — sidecar path. Default: `<project>/animations.json` when present.
> Note: `--recorded-narration` rejects `on-click`; use `after-previous` or `with-previous` for video-ready narrated decks.
## Anchor Logic — Top-Level `<g id="...">`
Per-element animations are anchored on **top-level `<g id="...">` content groups** in the SVG (e.g. `<g id="cover-title">`, `<g id="card-1">`). One group = one click reveal.
Aim for **38 content groups per slide**. This is also the granularity PowerPoint uses for group-select / group-move, so it improves editing ergonomics regardless of animation.
**Chrome groups skip the cascade automatically.** Top-level groups that look like page chrome (background, header/footer, decorations, watermark, page number, nav, logo, dividing rule) are excluded from the click sequence and appear together with the slide. Detection is done on the `id`: after splitting on `-` and `_`, if any token matches `background` / `bg` / `decoration` / `decorations` / `decor` / `header` / `footer` / `chrome` / `watermark` / `pagenumber` / `pagenum` / `nav` / `logo` / `rule`, the group is treated as chrome. Examples that auto-skip: `<g id="background">`, `<g id="bg-texture">`, `<g id="cover-footer">`, `<g id="p03-header">`, `<g id="bottom-decor">`, `<g id="watermark">`, `<g id="nav">`, `<g id="logo-area">`, `<g id="column-rule">`. Examples that still animate: `<g id="card-1">`, `<g id="cover-title">`, `<g id="step-discover">`, `<g id="timeline-track">`. Don't strip the `<g>` wrapper to avoid animation — keep it (PowerPoint group-select needs it) and just name it appropriately.
**Fallback for flat SVGs** (no top-level `<g>` wrappers, only raw `<rect>` / `<text>` / `<path>` at the root):
- ≤ 8 visible top-level primitives → each becomes one anchor (capped to avoid 70+ atom cascades on dense pages).
- > 8 → animation is skipped on that slide. The slide still renders, just without entrance animation.
Executors should wrap logical sections in `<g id>` regardless of whether you plan to animate. The Executor reference (`skills/ppt/references/shared-standards.md`) requires it.
## Limitations
- **Native shapes mode only.** Per-element animation needs editable shape anchors. `--only legacy` produces one image per slide and has no element granularity to animate; that mode is unaffected by `-a/--animation` and only honors `-t/--transition`.
- **Office version drift on element animations.** Effects use the `<p:animEffect filter=...>` path (vs. `presetID` lookup tables) to stay stable across Office versions. Most filters render identically in PowerPoint 2016+; older Office may downgrade some filters to plain Appear.
- **PNG fallback (compat mode) is for visual rendering only.** Transitions and animations live in the slide XML, not in the PNG, so disabling compat mode does not affect either layer.
## Quick Reference
| Goal | Command |
|---|---|
| Disable transitions | `-t none` |
| Change transition effect | `-t push` (or any from the list above) |
| Slower transition | `--transition-duration 0.8` |
| Auto-play | `--auto-advance 5` |
| Disable element animation | `-a none` |
| Switch to on-click trigger | `--animation-trigger on-click` |
| Use a single effect instead of auto | `--animation fade` |
| All groups animate together | `--animation-trigger with-previous` |
| Slower per-element reveal | `--animation-duration 0.5` |
| Wider gap in after-previous | `--animation-stagger 0.7` |
See also: [`scripts/docs/svg-pipeline.md`](../scripts/docs/svg-pipeline.md) for the full `svg_to_pptx.py` reference.

View File

@ -0,0 +1,75 @@
# Canvas Format Specification
> See shared-standards.md for SVG basic rules.
## Format Quick Reference
| Format | viewBox | Ratio | Use Case |
|--------|---------|-------|----------|
| PPT 16:9 | `0 0 1280 720` | 16:9 | Business presentations, meetings |
| PPT 4:3 | `0 0 1024 768` | 4:3 | Traditional projectors, academic talks |
| Xiaohongshu (RED) | `0 0 1242 1660` | 3:4 | Image-text sharing, knowledge posts |
| WeChat Moments / IG | `0 0 1080 1080` | 1:1 | Square posters, brand showcases |
| Story / TikTok | `0 0 1080 1920` | 9:16 | Vertical stories, short video covers |
| WeChat Article Header | `0 0 900 383` | 2.35:1 | WeChat article cover images |
| Landscape Banner | `0 0 1920 1080` | 16:9 | Web banners, digital screens |
| Portrait Poster | `0 0 1080 1920` | 9:16 | Phone screens, elevator ads |
| A4 Print | `0 0 1240 1754` | 1:sqrt(2) | Print posters, flyers |
## Format Selection Decision Tree
```
Content purpose?
├── Presentation
│ ├── Modern devices → PPT 16:9 (1280x720)
│ └── Traditional devices → PPT 4:3 (1024x768)
├── Social sharing
│ ├── Xiaohongshu (RED) → 1242x1660
│ ├── WeChat Moments / IG → 1080x1080
│ └── Story / TikTok → 1080x1920
└── Marketing materials
├── WeChat Article Header → 900x383
├── Banner → 1920x1080
└── Print → 1240x1754
```
## Layout Principles
### Landscape (16:9, 4:3, 2.35:1)
- Visual flow: Z-pattern, left to right
- Margins: 40-80px
- Layouts: multi-column, left-right split, grid
- Card dimensions (16:9): single-row 530-600px, double-row 265-295px
### Portrait (3:4, 9:16)
- Visual flow: top to bottom
- Margins: 60-120px
- Layouts: single-column, top-bottom split, card stacking
- Card dimensions (3:4): height 400-600px, gap 40-60px
### Square (1:1)
- Visual flow: center-radiating
- Margins: 60-100px
- Core area: ~800x800px
## Format-specific Design
| Format | Title Area | Content Area | Special Notes |
|--------|-----------|--------------|---------------|
| PPT | 80-100px | Full width utilization | Page number bottom-right |
| Xiaohongshu (RED) | 180-240px (bold) | Generous top/bottom whitespace | Brand area at bottom 120-160px |
| WeChat Moments | 200-280px | Center 500-600px | QR code area at bottom 150-200px |
| Story | — | Middle 1500px | Top safe zone 120px, bottom 180px |
| WeChat Article Header | Center/left-aligned 48-72px | — | Image on right or as background |
> **Body font baseline scales with canvas and delivery purpose** — a PPT 16:9 baseline confirmed for read-close / business / projection cannot be carried onto tall canvases (Xiaohongshu / Story / A4). Pick the baseline from the confirmed canvas, not the recommended one; see the per-canvas px anchors in [`strategist.md`](strategist.md) §g "Font Size Ramp" (the system is px-only — all sizes are unitless px on every canvas).
## ViewBox Examples
```xml
<svg width="1280" height="720" viewBox="0 0 1280 720"> <!-- PPT 16:9 -->
<svg width="1242" height="1660" viewBox="0 0 1242 1660"> <!-- Xiaohongshu -->
<svg width="1080" height="1080" viewBox="0 0 1080 1080"> <!-- WeChat Moments -->
<svg width="1080" height="1920" viewBox="0 0 1080 1920"> <!-- Story -->
<svg width="900" height="383" viewBox="0 0 900 383"> <!-- WeChat Article Header -->
```

View File

@ -1,224 +0,0 @@
# PPT 设计硬规则
> 出稿前过一遍。**这些不是建议,是工程约束** —— 模型生成 PPT 最常见的失败模式都是违反这些规则。
## 信息设计纪律 (比视觉更重要 —— 先把这条吃透)
> "好看"七成靠**信息设计**、三成靠视觉。同样的红色卡片,标题写"行业背景"还是"渗透率破 60%,行业进入深水区",观感差一个档次。模型最容易堆视觉、忘内功 —— 这一节是把 deck 从"AI 味模板"拉到"咨询级"的关键。
### 1. 论断式标题 (Assertion title) —— 标题写结论,不写主题
每页标题是**一句可带走的结论**,不是话题名。
| 类型 | ❌ 主题式(避免) | ✅ 论断式(推荐) |
|---|---|---|
| 背景 | "行业背景" | "数字渗透率破 60%,行业进入深水区" |
| 现状 | "什么是大模型" | "大模型靠规模涌现出通用智能" |
| 历程 | "发展历程" | "六年从 GPT-1 到推理模型,能力指数跃迁" |
| 竞争 | "竞品分析" | "三家主要对手在渠道覆盖上明显薄弱" |
### 2. Takeaway 结论框 —— 每页标题下一句话结论
内容页标题下加 `P.add_takeaway(slide, "<一句话结论>")`(浅主色底 + 左主色条)。把"这页要讲什么"压成一句。**金字塔原则**:结论先行,再展开 3 条论据。
### 3. 数据语境化 —— 数字不要孤立出现
每个关键数字配三件:**数值本身(大)+ 对比基准(行业均值/上期/竞品)+ 含义("所以呢")**。
`P.add_kpi(..., baseline="行业均值 82%", delta="+11pt")`(升=绿/降=红,业界约定);含数据的页用 `P.add_source(slide, "<来源>")` 标来源。
> 例:"97.3%" 下面跟 "行业均值 82% | 领先 15 个点",而不是光一个 "97.3%"。
### 4. page_rhythm 节奏 —— 相邻页不许同版式
逐页大纲给每页标密度,**breathing 页强制打破卡片网格**(否则每页都退化成卡片网格 = AI 味):
| 标签 | 版式纪律 |
|---|---|
| `anchor` | 结构页(封面/章节/目录/尾页),走固定品牌版式 |
| `dense` | 信息密集(默认):卡片网格 / KPI / 图表 / 时间轴 / 表格都行 |
| `breathing` | 低密度冲击页:**禁止多卡网格**,用大字 + 留白 + 整图 + 金句。典型:单个大数字 + 一句语境、整图 + 浮层标题、金句 |
内容→版式映射:历程→时间轴(`add_timeline`)、循环→闭环/流程(`add_cycle`)、2-4 数字→KPI 卡(`add_kpi`)、并列概念→均衡网格(`add_card_grid`,全 deck ≤2 次)、单个震撼数字→breathing 大字页。
## 0. 画布 (默认 16:9)
| 用途 | 比例 | 宽×高 (英寸) | python-pptx |
|-----|------|------------|------------|
| **现代商务汇报** ⭐ 默认 | 16:9 | 13.33 × 7.5 | `Inches(13.33), Inches(7.5)` |
| 老投影 / 教学 | 4:3 | 10 × 7.5 | `Inches(10), Inches(7.5)` |
| 手机 / 视频号 | 9:16 | 7.5 × 13.33 | `Inches(7.5), Inches(13.33)` |
| 小红书 | 3:4 | 7.5 × 10 | `Inches(7.5), Inches(10)` |
| A4 横 / 竖 | √2:1 | 11.69 × 8.27 / 反 | 同左 |
不知道选哪个 → **16:9**。安全边距统一:左右 0.7 in,上下 0.5 in。**画布定了不要中途改**,后续坐标全按这个尺寸算。画布超 16:9 默认尺寸时所有字号 × `(实际宽 / 13.33)`
## 1. 字号 (16:9 标准)
| 元素 | 字号 (Pt) | 备注 |
|-----|----------|------|
| 主标题 (封面) | 44-54 | 单行不换行 |
| 标题 (内页) | 28-36 | 中文常用 32 |
| 副标题 / 章节小标题 | 20-24 | |
| 正文 / bullet | 18-22 | 低于 18 投影看不清 |
| 注释 / 数据来源 | 12-14 | 灰色,弱化 |
| 页脚页码 | 10-12 | 弱化处理 |
**底线**: 投影到 100 寸大屏,后排看得清最小字号是 18pt。**绝不能小于 14pt**,除非是数据来源等弱化信息。
## 2. 配色
### 三色制
- **主色 (Primary)** —— 标题、强调、关键数据。占视觉权重 60%
- **辅色 (Secondary)** —— 副标题、次要图形元素。占 30%
- **强调色 (Accent)** —— 关键数据点、CTA、警告。占 10%,不要泛滥
- 其他全部用灰阶 (#1F1F1F / #555 / #888 / #CCC / #F5F5F5)
### 推荐配色对照 (红色主题为默认)
| 风格 | 主色 | 辅色 | 强调色 | 备注 |
|-----|------|------|-------|------|
| **商务红** ⭐ 默认 | #C00000 | #E15554 | #FFC107 | 党政/年终/路演通用 |
| 中国红 | #8B0000 | #B22222 | #FFD700 | 民族/国货/红色文化主题 |
| 现代红 | #B91C1C | #DC2626 | #F59E0B | 新消费/科技产品发布 |
| 暖朱红 | #C73E1D | #E76F51 | #F4A261 | 学术汇报/行业会议 |
| 商务蓝 | #1F4E79 | #2E75B6 | #FFC000 | 金融/保险/政企 |
| 学术灰 | #2F2F2F | #595959 | #C00000 | 严肃论文/答辩 |
| 现代简约 | #2D3748 | #4A5568 | #38B2AC | 互联网/SaaS |
| 科技深色 | #0A192F | #112240 | #64FFDA | 黑客松/技术大会 |
### 派生色阶(卡片式视觉的层次来源)
`set_palette` 从主/辅/强调自动派生明暗阶,**这些不算"新色"**(quality_check 按色相归桶,同色系深浅收敛成一个):
- `PRIMARY_WASH`(主色兑 92% 白)—— 整页/大区域浅底(尾页、L13 论据卡)
- `PRIMARY_SOFT`(兑 80% 白)—— 卡片/图标底块/标签浅底
- `PRIMARY_DARK`(主色压暗)—— 封面/章节渐变深端
- `ACCENT_SOFT`(强调兑 78% 白)—— 渐变深底上的弱化文字
> 白底之上靠卡片(`add_card` 圆角+投影)+ 浅色阶分层,才有"现代咨询风"的层次;纯白底裸贴元素 = 扁平办公模板。
### 语义状态色 (例外)
趋势/状态用业界约定:**绿 `P.GOOD` = 增长/正向,红 `P.BAD` = 下降/风险,灰 = 持平**。这套语义色**不计入三色制**(quality_check 把绿色当语义色豁免)。只用在 KPI 趋势、表格升降这类语义场景,别拿来当装饰。
### 禁忌
- 红配绿、紫配黄等高对比互补色不要直接用(语义升降色除外)
- **渐变只用在大色块**(封面右块 / 章节整页,`apply_brand` 已内置);正文/标题/小图形不要渐变
- 一份 deck 主色不要换。封面是 A 色、内页变 B 色 —— 这是大忌
- 渐变深底上文字一律用**白 / `ACCENT_SOFT`**,别用深灰 `INK`(看不清)
## 视觉深度:投影是克制,不是默认
> 抄自 pptmaster shared-standards §6 —— "设计感来自'没有',不是'到处都有'"。模型最爱给每张卡都加投影,这恰恰是模板味的来源。
- **平卡是常态**:`add_card` 默认平卡(白底描发丝边)。**平铺网格里的对等卡一律平**,不投影。
- **投影只给真悬浮的**:照片/色块上的卡、被挑出的"推荐"项、浮层/标注。`add_card(..., shadow=True)` 手动开。
- **每页 ≤2-3 个投影元素**。够第 4 个了,先撤一个。
- **一个容器只用一种视觉手段**:投影 / 描边 / 渐变底 / 强主色底 —— **四选一,不叠加**(叠加 = 瞬间模板味)。
- **单一光源**:同页所有投影同方向(默认光从上方,`dy>0`)。
- 渐变深底上投影会消失,改用 1px 低透明白描边或外发光。
## 3. 留白
- 标题与上边距 ≥ 0.4 英寸
- bullet 之间行距 1.3-1.5 倍
- 一页内容占满 70% 即可,**不要塞到边缘**
- 边距统一 (左右 0.7 寸,上下 0.5 寸常用值)
## 4. 信息密度
| 页类型 | 字数上限 | 图表 |
|-------|---------|-----|
| 封面 | 30 字 | 可选装饰图 |
| 目录 | 每条 ≤ 15 字 | 不要图 |
| 分章页 | ≤ 20 字 | 大号数字 + 章节名 |
| 要点页 | bullet ≤ 5 条,每条 ≤ 25 字 | 可选小图标 |
| 数据页 | 标题 + 一句结论 | **必须有图表**;2-4 个数字优先 KPI 卡(L10)而非柱图 |
| 概念页 | 卡片标题 ≤6 字 + 说明 ≤2 行 | 图标底块 + 卡片网格(L11),别裸圆点 |
| 图片页 | ≤ 15 字标题 + 1-2 行说明 | 主体是图 |
## 4.1 字数预算 (避免溢出)
> 这是**布局超界的根因表**。bullet 写超了会顶到下一页元素;标题写超了会换行顶下来。开写前查这张表,而不是写完看 quality_check 报错。
公式: `每行字数 ≈ 框宽(in) × 72 / 字号(pt)`
| 字号 | 框宽 11.93 in (整宽) | 框宽 5.5 in (双栏单边) | 框宽 4.6 in (图片页文字区) |
|-----|--------------------|----------------------|--------------------------|
| 44 pt (主标题) | ≤ 19 字 | — | — |
| 36 pt (大标题) | ≤ 23 字 | — | — |
| 32 pt (内页标题) | ≤ 26 字 | — | — |
| 22 pt (要点) | ≤ 39 字 | ≤ 18 字 | ≤ 15 字 |
| 18 pt (正文) | ≤ 47 字 | ≤ 22 字 | ≤ 18 字 |
| 14 pt (注释) | ≤ 61 字 | ≤ 28 字 | ≤ 23 字 |
**英文字符按中文 0.5 个换算** (即英文每行约 2× 中文字数)。
### 行高估算
每行高度 ≈ `字号 × 1.4 / 72` (英寸)
| 字号 | 单行高 | 1 行框高 | 2 行框高 | 3 行框高 |
|-----|-------|---------|---------|---------|
| 32 pt | 0.62 in | 0.7 in | 1.3 in | 1.9 in |
| 22 pt | 0.43 in | 0.5 in | 0.9 in | 1.3 in |
| 18 pt | 0.35 in | 0.4 in | 0.8 in | 1.1 in |
| 14 pt | 0.27 in | 0.3 in | 0.6 in | 0.9 in |
**用法**: bullet 字数预计超表上限就拆条,不要试图靠 `auto_size` 收缩字号兜底 —— 会出现一页里字号大小不一,反而难看。
## 5. 文字层级
- 一页最多 3 级层级 (标题 / 正文 / 子项)
- 子项缩进 0.3-0.5 英寸
- 子项字号比父级小 2-4pt
- 不要四级以上嵌套
## 6. 图片规则
- **分辨率**: 投影建议 150 dpi 以上,印刷 300 dpi
- **占位**: 图片占满指定区域,不要拉伸变形 —— 用 `width=``height=` 单一参数让 python-pptx 等比缩放
- **背景**: 透明 PNG 优先;白底 JPG 在深色页上要做底色匹配
- **数量**: 一页最多 2 张图,3 张以上是网格图,按九宫格摆
## 7. 图表规则 (matplotlib)
> **先问要不要图表**:只有 2-4 个数字 → 用 KPI 卡(layouts L10),别画柱图;真有趋势/分布/多系列才上 matplotlib。图表 png 嵌进 `add_card` 白卡片里(L6)比裸图精致。
- 颜色用 spec 里定的主/辅/强调三色,**不要用 matplotlib 默认色板**
- 字号: 标题 16,坐标轴 12,刻度 10
- **去四边框**,只留极淡横向网格 (`ax.spines[*].set_visible(False)` + `ax.grid(axis='y', color='#EEEEEE', lw=0.8)`)—— 比全框 + 默认网格干净,跟卡片观感一致
- 数据标签直接标在柱子/点上,优先于看坐标
- 透明底:`fig.savefig(..., transparent=True)`,嵌白卡片上无白边
- 中文字体: `plt.rcParams['font.sans-serif'] = ['Microsoft YaHei', 'SimHei']`
- 负号: `plt.rcParams['axes.unicode_minus'] = False`
```python
# 示例:符合规则的柱状图 (默认红色主题)
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['Microsoft YaHei', 'SimHei']
plt.rcParams['axes.unicode_minus'] = False
fig, ax = plt.subplots(figsize=(10, 5), dpi=150)
bars = ax.bar(["Q1","Q2","Q3","Q4"], [12,18,25,31],
color=["#C00000","#C00000","#C00000","#FFC107"]) # 末尾突出
for bar, v in zip(bars, [12,18,25,31]):
ax.text(bar.get_x()+bar.get_width()/2, v+0.5, str(v),
ha='center', fontsize=11)
ax.set_title("季度营收 (亿元)", fontsize=16)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
fig.savefig("chart.png", bbox_inches="tight", dpi=150)
```
## 8. 一致性 (跨页)
- 标题位置不要跳来跳去 —— 所有内页标题都在同一像素位置
- 页脚 (页码 / logo / 标题) 在所有内页位置一致
- 字体在同 deck 内不要换 —— 中文一种字体,英文一种,够了
- 配色不变,字号梯度不变
## 9. 反模式速查
| 症状 | 原因 | 修法 |
|-----|------|-----|
| 一页字密密麻麻 | 没拆页 | 拆 2-3 页或转图表 |
| 投影看不清 | 字号 < 18 | 加大字号或拆页 |
| 颜色花 | 用了超过 5 种色 | 退回三色制 |
| bullet 是完整段落 | 把演讲稿当 bullet 写 | 提炼关键词,完整句留给口述 |
| 图表默认配色 | 没改 matplotlib 色板 | 用 spec 主色 |
| 图标/图片随意找的 | 没统一风格 | 同一来源 / 同一风格 |
| 标题在每页位置都不一样 | 没用统一版式 | 见 layouts.md,固定模板 |

View File

@ -0,0 +1,469 @@
# Executor Common Guidelines
> Narrative skeleton and visual aesthetic come from this deck's locked files under [`modes/`](./modes/_index.md) and [`visual-styles/`](./visual-styles/_index.md). Technical constraints are in shared-standards.md.
---
## 1. Template Adherence Rules
### 1.0 Pre-generation Batch Read
**Hard rule**: Before the first SVG page, batch-read every template SVG this deck will reference. Read once up front, never re-read during generation.
| Source list | Read path |
|---|---|
| Chosen template's `design_spec.md` (read frontmatter to detect `replication_mode`) | `templates/design_spec.md` |
| Every distinct `<basename>` in `spec_lock.md page_layouts` | `templates/<basename>.svg` |
| Every distinct chart name in `spec_lock.md page_charts` | `templates/charts/<chart_name>.svg` |
| Chart types in `design_spec.md §VII` not covered above | `templates/charts/<chart_name>.svg` |
**Default — read each template once; re-read only on the mid-deck exception below**:
- Layout SVG already loaded in this batch
- Chart SVG already loaded in this batch
`spec_lock.md` is the only file re-read per page (§2.1).
**Exception**: user mid-deck adds pages or swaps templates introducing a basename/chart absent from the original batch → read the new file once, continue.
> Note: batched prefix reads stay in the cached prompt prefix; per-page `spec_lock.md` re-reads append below and benefit from that cache. Scattered on-demand reads of layout/chart SVGs would invalidate downstream cache and sit in the compression-vulnerable mid-context region.
Resolve the per-page template SVG via `spec_lock.md page_layouts` (authoritative). The legacy page-type table below is a **last-resort fallback** for legacy decks where `page_layouts` is missing.
**Resolution order (per page):**
1. **Mirror-mode template** (template's `design_spec.md` frontmatter has `replication_mode: mirror`) → see §1.1 below. The page is consumed as a **visual reference**, not as a placeholder shell.
2. `spec_lock.md page_layouts` has `P<NN>: <basename>` for this page → inherit the structure of `templates/<basename>.svg` (already in context from §1.0).
3. `page_layouts` exists but **no entry** for this page → **free design**, no template inheritance.
4. `page_layouts` section absent (legacy deck) **and** `templates/` directory exists → fall back to the page-type table below, matching by SVG filename keyword (cover/chapter/content/ending/toc). Read the matched file at first use if §1.0 batch did not cover it.
5. No template at all → free design.
> Note: `page_layouts` disambiguates the multiple content variants modern templates ship (e.g., `graduation_defense` has 8); the legacy table cannot.
**Templates supply structure, not skin (non-mirror)**: a chart or layout template's gradients, drop-shadows, palette, **and font sizes** are placeholder. Inherit its geometry, label / legend placement, and series-encoding logic; re-skin every fill / stroke to the deck's `visual_style` + `spec_lock.colors` — flat styles strip the gradients and shadows, gradient / glass styles repaint their own. Forbidden — shipping a template's default `<linearGradient>` / `cardShadow` / Tailwind fills unchanged. Mirror templates are the exception: §1.1 preserves their visuals verbatim.
**Font size is skin, not geometry (non-mirror).** A chart / layout template's hardcoded `font-size` values (often 1116px, sized for the template's own dense placeholder text) are NOT inherited — classify each text into its `spec_lock.md` role and use that role's locked size, exactly as you re-skin color. **Structural roles (page title / body / subtitle / annotation / footnote) hold their one deck-wide size on every page** — the template's placeholder px never overrides it; same-role text drifting page to page is what makes a deck look unprofessional.
**Typography execution order (mandatory):**
1. Build a per-page text inventory from `design_spec.md §IX` + the current `notes/<NN>_*.md`.
2. Classify each text item before drawing. **Structural roles** (`title`, `subtitle` / `lead`, `body`, `annotation`, `footnote` / `page_number`) must map to their declared `spec_lock.typography` slot. A **one-off feature element** (a single hero number, an isolated emphasis label) may take an in-ramp intermediate value — the ramp is anchored on `body`, not a closed menu — but a feature size that **recurs** must be promoted to a declared slot. The failure mode this guards against is structural text silently inheriting the template's compact px, not legitimate feature sizing.
3. Copy the role's locked px value into `font-size` verbatim. Do this before placing the text; never start from a template `font-size` and then "adjust".
4. Layout from those locked sizes: compute line-height, wrapped line count, child `y` / `dy`, card padding, card height, column gaps, and available image/chart area from the chosen px values.
5. Only after this reflow may you inspect fit. If fit fails, move / resize containers or simplify local geometry first; do not reduce the role size merely because the inherited template slot was smaller.
**Geometry adapts to the type, never the reverse**: when the locked size is larger than the template's placeholder text, widen / heighten the card, open spacing, and recompute child `y` / `dy` to make room — do not shrink the font to fit the inherited container. A `font-size` change is a layout change: revise line-height and every downstream vertical coordinate that depends on it. For wrapped text, allocate at least the wrapped line count × line-height plus top / bottom padding; fixed `y` stacks copied from a smaller template are invalid once the locked role size is applied. The Executor renders the page it was given; page count and per-page density are the Strategist's call, fixed at confirmation — do **not** re-paginate, split the page, or drop authored content to cope with size here. Only when a single block still cannot fit after the geometry is fully reflowed may you shrink **that block** as a bounded last resort — and **only body text** is ever shrunk this way. Title, subtitle, annotation / caption, footnote and page number are **locked once set and never adjusted to fit** — their values hold across the whole deck. Step the overflowing body block's `font-size` down by `2`px at a time, and only if it still overflows step it down again, up to a cumulative floor of **`4`px below the locked body size** (e.g. `24` → no smaller than `20`). This is a **local, single-block** reduction — the deck-wide locked body size is unchanged on every other block and page. (The Executor works in **unitless px** throughout — spec_lock and SVG carry no `pt`.) If the block still overflows at the floor, surface a `warning:` rather than silently restructure the page. (Mirror templates are the exception: §1.1 preserves their sizes verbatim — there the source deck's typography *is* the spec.)
### 1.1 Mirror-mode templates — reference-style consumption
When the project's chosen template is a `mirror` template (`design_spec.md` frontmatter declares `replication_mode: mirror`), Executor switches to a **reference-style** consumption path that bypasses placeholder substitution:
1. **Per-page reference selection** — Strategist selects one mirror page per project page via `spec_lock.md page_layouts` (e.g., `P04: 015_content`). The basename is the mirror filename without extension; Strategist made this choice by reading `design_spec.md §V Page Roster` descriptions, not by guessing.
2. **Copy, don't fill** — open the referenced mirror SVG (already in context from §1.0). **Copy it as the starting point for the project page**, then edit text elements in place to express the project's content for `P<NN>`. Preserve every non-text element verbatim: backgrounds, decorative shapes, sprite-cropped images, charts, icon usage, color values, font families, geometry, sprite `<svg viewBox>` wrappers, and **which image** each `<image>` points at.
3. **What you may edit** — the visible text content of `<text>` / `<tspan>` elements that express slide-specific content (title, body, captions, KPI labels, dates, page numbers). Replace the source deck's example text with the project's text for this page from `design_spec.md §IX` and `notes/<NN>_*.md`.
4. **What you must not touch** — element positions, sizes, fonts, colors, fills, strokes, gradients, **which image each `<image>` points at**, `<g>` grouping, sprite-sheet `<svg viewBox>` wrappers, decorative `<rect>` / `<path>` / `<circle>` / `<polygon>` shapes, `<use data-icon="...">` markers, embedded chart data structures. Mirror's value is preserving the source deck's visual identity — any geometric / decorative drift defeats the purpose. **The `href` path is not the image**: normalizing a bare `href="cover_bg.png"` to `href="../images/<name>"` (when Step 3 relocated the asset to `images/`) points at the *same* image and changes nothing visual — that is an allowed path fix, not a fidelity edit. Leaving the bare href as-is is also fine; the exporter and live preview resolve bare hrefs against `images/` either way.
5. **Content fit** — the mirror page was chosen by Strategist because its layout matches the content slot. If the project's content for `P<NN>` legitimately needs more / fewer items than the mirror page provides (e.g. mirror shows 3 KPI cards, project has 4 metrics), keep the mirror page's visual rhythm and either drop one metric to fit or split across two pages — do **not** restructure the mirror page's grid. If neither works, surface a `warning: P<NN> content does not fit mirror reference <basename>; suggest different reference page` and proceed with the closest-fit edit.
6. **No `{{}}` substitution** — mirror SVGs do not contain placeholder markers. Do not search for `{{TITLE}}` / `{{CONTENT_AREA}}` etc.; do not invent placeholders. The whole mirror contract is "verbatim source + in-place text edit".
7. **Output filename** — follow the standard project SVG naming convention (`<NN>_<page_name>.svg` where `<NN>` matches the project page index, not the mirror source index). The mirror filename is the *reference*, not the *output*.
**Detecting mirror mode**: read the chosen template's `design_spec.md` frontmatter once during §1.0 batch read. If `replication_mode: mirror`, every page that hits `page_layouts` follows §1.1 above; pages without a `page_layouts` entry still fall through to free design (resolution rule 3 above).
**Mirror + chart pages**: chart structures inside a mirror SVG are already drawn (axis, series, labels). Treat them as visual references — replace the data labels and series text content to match the project's chart spec, but do not redraw the chart from a `templates/charts/<name>.svg` baseline. A mirror template's `page_charts` entries are normally absent for this reason.
**Legacy fallback table** (used only when `page_layouts` is absent):
| Page Type | Corresponding Template | Adherence Rules |
|-----------|----------------------|-----------------|
| Cover | `01_cover.svg` | Inherit background, decorative elements, layout structure; replace placeholder content |
| Chapter | `02_chapter.svg` | Inherit numbering style, title position, decorative elements |
| Content | `03_content.svg` | Inherit header/footer styles; **content area may be freely laid out** |
| Ending | `04_ending.svg` | Inherit background, thank-you message position, contact info layout |
| TOC | `02_toc.svg` | **Optional**: Inherit TOC title, list styles |
### Page-Template Mapping Declaration (Required Output)
Before generating each page, output which template is used:
```
📝 **Template mapping**: `templates/03a_content_image_text.svg` (or "None (free design)")
🎯 **Adherence rules / layout strategy**: [specific description]
```
- **Content pages**: template defines only header/footer; content area is free
- **No template**: generate entirely per the Design Spec
---
## 2. Design Parameter Confirmation (Mandatory Step)
Before the first SVG page, output a confirmation listing: canvas dimensions, body font size, color scheme (primary/secondary/accent HEX), font plan. Prevents spec/execution drift.
### 2.1 Per-page spec_lock re-read (Mandatory)
> Long decks drift off the declared palette/icons mid-deck due to context compression. `spec_lock.md` is the canonical execution reference — re-read it per page to bypass model memory.
**Hard rule**: Before generating **each** SVG page, `read_file <project_path>/spec_lock.md`. Use only values from this file, not from memory. If context was auto-compacted, also `read_file <project_path>/design_spec.md` for the current page's §IX brief.
**Per-block expression**: render each `design_spec.md §IX Content` block in its written texture — a full-sentence block as wrapped prose, a fragment/label block as bullets/keywords. **Never split a full-sentence block into a bullet list** — splitting loses the information that the block was continuous reasoning, not a set of parallel points; not because a bullet lays out easier, and not because an inherited template slot is shaped as a list. If a block carries no clear texture, infer the mode from its wording and the page layout.
- **Prose render recipe**: one `<text>` per paragraph; wrap lines with sibling `<tspan>` that reset `x` to the block's left edge and advance `dy` by the font size × a line-height factor. **Default — line-height by density (may override per content fit)**: ~1.41.5× for dense / small-body blocks (CLReq comfortable minimum), 1.62.0× for large-type, sparse, or `breathing` blocks. Fit about width ÷ font-size CJK glyphs per line (Latin fits roughly twice that); the last line runs short. Use the body ramp size, not a new one.
- **Template precedence**: when an inherited template slot is a bullet list but the §IX block is prose, the prose wins — widen or reflow the container to hold the paragraph, or drop that card; do not pour the sentence back into the list slot.
- **Mode precedence**: the locked mode shapes voice / register, not §IX's authored titles or page order. When a `§IX` title is a user-authored topic label, keep it — do not upgrade it to an assertion just because the mode (e.g. `pyramid`) favors them; mode title-tendencies apply only to AI-drafted titles.
> Note: block-level phrasing, applied *within* the page's `page_rhythm` density (below), not against it.
**If `spec_lock.md` is missing**: emit `warning: spec_lock.md missing — generating without execution lock` once, then proceed using `design_spec.md` values. Expected only for legacy projects; new projects MUST have it (see [strategist.md](strategist.md) §6 step 4).
**Forbidden — values outside the lock**:
- Colors (fill / stroke / stop-color) MUST come from `colors`
- Icons MUST come from `icons.inventory`; library MUST equal `icons.library`
- Font family from `typography`: use role override (`title_family` / `body_family` / `emphasis_family` / `code_family`) if declared, else fall back to `font_family`
- Font sizes follow a **ramp anchored on `typography.body`**, not a closed menu. **Structural roles — page title, body, subtitle, annotation / caption, footnote / page number — render at one consistent size deck-wide, taken from their `spec_lock` slot; never re-pick a structural role's size page by page or carry a template's placeholder px.** This locks the **role**, not every glyph: a page may still carry deliberate typographic hierarchy — a lead-in sentence, an inline emphasis figure, a pull-quote, a kicker, a hero number — but each of those is its **own role / feature element** with its own size, **applied consistently deck-wide** (declare a recurring one as its own `spec_lock` slot). In-band intermediate sizes are for exactly these feature elements. What is banned is the *same* role drifting size to fit a container or by page whim — that scatter is what reads as unprofessional. Sizes outside every band require extending the lock first.
- **The page's core message is primary — render it ≥ `body`.** The one-idea / key-claim / key-takeaway line a page is built around is its most important text; map it to the locked `lead` or `subtitle` slot (≥ `body`), never to a sub-`body` size. Demoting it below body while data callouts or labels sit larger inverts the hierarchy — the failure this prevents. If no `lead` / `subtitle` slot is locked for a recurring core-message line, surface it (per below) instead of improvising a smaller one. A footnote / page number / source credit uses the locked `footnote` (or `annotation`) slot — never an invented sub-`annotation` size; and the body-shrink last resort (§1.0) bottoms out at `body 4`px, a hard floor never crossed.
- **Write the locked px verbatim; at most 2 decimals.** `font-size` MUST be the exact px from `spec_lock.typography` — if `body` is `24`, write `24`; never substitute a "rounder" or PowerPoint-familiar number (`20` / `18` / `36`). The system is px-only — there is no pt to convert, and a remembered pt-style value written as px renders the whole deck the wrong size. Prefer whole numbers (sizes are clean even px); keep a decimal only for a slot that genuinely carries one in `spec_lock`. Never emit long tails like `20.8026`: the exporter rounds the final size to 1 decimal pt, so extra px precision is wasted noise.
- Images MUST reference files listed under `images`; no invented filenames
- Formula PNGs are images with `Acquire Via: formula` / `Status: Rendered`; place them only from the listed file path and never recreate the formula as text.
If a page needs a value not in `spec_lock.md`, surface it — do not silently invent one.
**Per-page layout rhythm — `page_rhythm` section**:
Before drawing each page, look up its entry in `page_rhythm` (key format `P<NN>` matching the page index in §IX of `design_spec.md`) and apply the corresponding layout discipline:
| Tag | Layout discipline |
|-----|-------------------|
| `anchor` | Structural page (cover / chapter / TOC / ending). With a template, follow the matching template verbatim. In free design (no template), realize the page's §IX intent — for the cover deliver its `Cover impact` and for a closing page its `Closing impact` (the committed hook / takeaway + composition), never a default centered title + subtitle or a generic "Thank you" sign-off. |
| `dense` | Information-heavy. Card grids, multi-column layouts, KPI dashboards, tables, and charts are all permitted — but a card grid is **not** the automatic default (see the anti-monotony rule below); pick the structure that fits the content's relationship. |
| `breathing` | Low-density impact page. Avoid **multi-card grid layouts** — do not organize content as multiple parallel rounded containers (3-card row, 4-card KPI grid, 2×2 matrix rendered as cards). Use naked text blocks, dividers, whitespace, or full-bleed imagery as the content structure. Single rounded visual elements (hero image corners, callouts, tags, one emphasis block) are fine — the rule is about grid structure, not about the `rx` attribute. Proportions follow information weight (not a preset ratio). Typical forms: hero quote, single large number with one-line interpretation, full-bleed image with floating caption, section transition. |
> Without rhythm variation, every page defaults to card grids (the "AI-generated" look). `page_rhythm` is the only narrative lever that survives context compression.
> 🚧 **Anti-monotony — map the content's RELATIONSHIP before defaulting to a card grid.** "N parallel items → an N×M grid of text cards" is the single biggest source of the AI-deck look. Before drawing cards, ask what the items *are to each other* and pick the structure that shows it:
> - **A system of interconnected parts** (六大体系 / 平台架构 / 能力地图) → hub-and-spoke, layered architecture, or module composition (`charts/hub_spoke`, `layered_architecture`, `module_composition`) — the connections ARE the message; a flat grid throws them away.
> - **A process / sequence / 历程** → flow with connectors or numbered steps (`charts/process_flow`, `numbered_steps`, `snake_flow`) or a timeline — arrows carry the "then".
> - **A hierarchy / breakdown** → tree or pyramid (`charts/top_down_tree`, `pyramid_chart`).
> - **A cycle / 闭环** → concentric or segmented wheel (`charts/concentric_circles`, `segmented_wheel`).
> - **Interdependent themes** → mind-map / network (`charts/mind_map`, `hub_spoke`).
> - **Comparison** → columns / quadrant / matrix (`charts/comparison_columns`, `matrix_2x2`, `quadrant_text_bullets`), not two stacks of cards.
> - **≥3 data points** → an actual chart (bar / line / donut …), never text cards of numbers.
>
> A plain card grid is the right answer ONLY for genuinely independent, unordered items with no relationship to show — and even then, vary card size by weight, add a connecting spine, or give each one icon. **Cap: at most ~1/3 of a deck's content pages may be plain card grids.** If you've just drawn two card-grid pages, the next relational page MUST be a diagram, not a third grid. Pull a `charts/` template's **geometry** as the starting structure (re-skin per §1 — structure not skin) so you are adapting a real diagram, not inventing connectors from scratch. This is why `spec_lock.md page_charts` matters: when Strategist assigned a template for this page, build THAT, don't collapse it back into cards.
**Missing `page_rhythm` section** → emit `warning: spec_lock.md missing page_rhythm — defaulting all pages to dense` once, fall back to `dense` for all pages.
**Tag not found for current page** → emit `warning: spec_lock.md page_rhythm tag not found for P<NN> — falling back to dense` once per deck (aggregate; do not repeat per page), fall back to `dense`. Do not invent a tag.
**Per-page template lookup — `page_layouts` section**:
Before drawing each page, look up its entry in `page_layouts` to decide which basename to inherit (the SVG itself was loaded in §1.0):
- Entry present (e.g., `P04: 03a_content_image_text`) → inherit the corresponding SVG already in context. The basename **must match** an actual file in the chosen template directory; if it doesn't, emit `warning: page_layouts P<NN> references missing file <basename>.svg — falling back to free design` and proceed.
- No entry for this page → free design, no inheritance. **Not an error** — Strategist intentionally left this page free.
- Whole section absent → see §1 fallback (legacy page-type matching).
Do **not** invent a layout entry, and do **not** assume a template just because `templates/` exists — if `page_layouts` is present but silent for this page, that silence is the instruction.
**Per-page chart reference — `page_charts` section**:
Before drawing each page, look up its entry in `page_charts` to decide which chart structure applies (the SVG itself was loaded in §1.0):
- Entry present (e.g., `P09: timeline_horizontal`) → adapt the corresponding chart SVG already in context. Apply project colors/typography/density; do not copy verbatim. Cross-reference `templates/charts/charts_index.json` for the chart's purpose summary if needed.
- No entry for this page → either no chart on this page, or a chart that didn't match any catalog template (Strategist's `no-template-match` fallback). Design the visualization from scratch using `design_spec.md §VII` for guidance.
- Whole section absent → no chart pages in this deck.
---
## 3. Execution Guidelines
- **Proximity**: group related elements with tight spacing; separate unrelated groups
- **Spec adherence**: follow color, layout, canvas format, and typography in the spec
- **Grid & alignment discipline (HARD — checker-enforced)**: hand-written absolute coordinates drift; the fix is snapping, not eyeballing.
- **Snap to `layout_grid`**: header title, content blocks, and footer left edges sit exactly at `spec_lock.layout_grid.margin_x`; body content starts at `content_top`; footer baseline at `footer_y`. Never write a "close enough" coordinate (63 when the grid says 60) — the quality checker errors on 215px deviations. Breaking the grid on purpose (full-bleed, asymmetric hero) means clearing it by ≥16px, not by a few px.
- **Sibling cards align exactly**: cards in one row share the same `y` and `height`; cards in one column share the same `x`; gaps in a row all equal `layout_grid.gutter`. Compute one set of constants per grid (`x = margin_x + i * (card_w + gutter)`) instead of placing each card by feel. Deliberate stagger (masonry) offsets by ≥16px.
- **Two-column layouts resolve the bottom edge**: either both columns end at the same y, or the shorter column is deliberately closed (vertical centering, a filler visual, a closing rule) — never one column dangling far above the other with dead whitespace.
- **No glued glyphs on one line**: adjacent inline elements (arrow + number, numeral + unit, badge + label) keep ≥0.3em horizontal gap. An arrow touching a digit ("→02") reads as a typo.
- **Budget vertical space BEFORE writing the page**: items × row height must fit within `content_top`..`content_bottom`. If N items don't fit, cut items, tighten copy, or change layout — never compress row gaps until the last item's description lands in the footer zone (shipped failure: a 5-item list whose 5th description was clipped by the footer). The checker errors on any text baseline past `content_bottom`.
- **Never degrade an assigned chart to text**: when `spec_lock.page_charts` assigns this page a chart, the page MUST draw a real figure (adapt the named `templates/charts/` template). Rendering the numbers as a text list or big KPI type instead of the assigned chart is a checker-level error — quantitative evidence belongs in a figure, and the Strategist already decided this page gets one.
- **Template structure**: if templates exist, inherit the visual framework
- **Main-agent ownership**: SVG generation must run in the main agent (not sub-agents) — pages share upstream context for cross-page visual continuity
- **Generation rhythm**: lock global design context first, then generate pages sequentially in one continuous context. No batched groups (e.g., 5 at a time).
- **Phased batch generation** (recommended):
1. **Visual Construction Phase**: generate all SVG pages sequentially for visual consistency. Use layout judgment for chart marks during the draft. **MUST embed plot-area markers** per §3.1 below on every chart page — coordinate calibration is a post-generation step (see [`workflows/verify-charts.md`](../workflows/verify-charts.md)) that depends on these markers.
2. **Quality Check Gate**: run `python3 scripts/svg_quality_checker.py <project_path>` on `svg_output/`. Any `error` (banned features, viewBox mismatch, spec_lock drift, non-PPT-safe font, etc.) MUST be fixed on the offending page before proceeding — regenerate and re-check. Address `warning`s when straightforward. Do NOT defer to after `finalize_svg.py` — finalize rewrites SVG and masks some violations.
3. **Logic Construction Phase**: after SVGs pass the quality check, batch-generate speaker notes for narrative continuity.
### 3.1 Chart Plot-Area Marker (MANDATORY on every chart page)
> The [`verify-charts`](../workflows/verify-charts.md) workflow enumerates chart pages from `design_spec.md §VII`, then reads each page's plot-area marker to feed `svg_position_calculator.py`. Missing marker → verify-charts has to re-derive the plot area from axis lines, paying the cost on every run.
**Hard rule**: every SVG page that contains a data visualization chart includes a plot-area marker inside `<g id="chartArea">`, placed **after axis lines** and **before the first data element** (bar, line, area, point).
**Rectangular plot area** (bar / horizontal_bar / grouped_bar / stacked_bar / line / area / stacked_area / scatter / waterfall / pareto / butterfly):
```xml
<!-- chart-plot-area: x_min,y_min,x_max,y_max -->
```
**Radial charts** (pie / donut / radar):
```xml
<!-- chart-plot-area: pie | center: cx,cy | radius: r -->
<!-- chart-plot-area: donut | center: cx,cy | outer-radius: r1 | inner-radius: r2 -->
<!-- chart-plot-area: radar | center: cx,cy | radius: r -->
```
**How to determine coordinate values**:
| Value | Derivation |
|-------|------------|
| `x_min` | X coordinate of the Y-axis line (leftmost data boundary) |
| `y_min` | Y coordinate of the topmost grid line (highest data boundary) |
| `x_max` | X coordinate of the rightmost axis endpoint or grid line |
| `y_max` | Y coordinate of the X-axis baseline |
| `cx, cy` | Center point of pie/donut/radar (accounting for `transform="translate()"`) |
| `r` | Outer radius of the chart |
**Per-page verification** — after writing each chart SVG, confirm the marker exists:
```bash
grep "chart-plot-area" <project_path>/svg_output/<current_page>.svg
```
> All chart templates in `templates/charts/` include this marker as a reference. If you are drawing a chart and the marker is absent, you have a bug.
- **Technical specs**: see [shared-standards.md](shared-standards.md) for SVG/PPT constraints
- **Card containers — use the documented patterns**: when a content page needs section cards (4 quadrants, parallel aspects, capability blocks, info cards), use the patterns codified in [`templates/charts/CHART_STYLE_GUIDE.md`](../templates/charts/CHART_STYLE_GUIDE.md) §11 — half-rounded section tab (§11.1), nested card border without stroke (§11.2), card-grid skeletons (§11.3), diagonal dashed connector for cross-quadrant relationships (§11.5), ground-anchor ellipse as a non-filter depth marker (§11.6), bidirectional interaction arrows for paired protocols (§11.7). Do not reinvent the "tinted full-rounded rect + white cover-rect to hide the bottom corners" hack; it survives in older templates but breaks SVG→PPTX color editing. Reference templates: [`labeled_card.svg`](../templates/charts/labeled_card.svg), [`quadrant_text_bullets.svg`](../templates/charts/quadrant_text_bullets.svg), [`kpi_cards.svg`](../templates/charts/kpi_cards.svg), [`matrix_2x2.svg`](../templates/charts/matrix_2x2.svg), [`team_roster.svg`](../templates/charts/team_roster.svg), [`client_server_flow.svg`](../templates/charts/client_server_flow.svg).
- **Reference — prefer semantic shapes over preset stacks (not a constraint)**: when a slide needs to express "ascending / converging / breaking through / stacking" — i.e., a relationship that goes beyond a generic arrow — prefer a single custom `<polygon>` or `<path>` that encodes the semantics geometrically, rather than stacking multiple preset arrows. A converging-tip path or a podium polygon reads faster than three arrows pointing at a label. Examples of this technique appear in many imported corporate decks; see `projects/01_template_import/svg_output/slide_01.svg` shape-158 for a reference (gradient-filled inward-pointing arrow). Do not codify these as templates — they are page-specific; the rule is just "consider polygon before stacking presets."
- **Reference — visual depth through restraint (not a constraint)**: layered depth comes from rhythm (flat vs lifted, dense vs spacious), not from shadows everywhere. Shadow typically suits 2-3 genuinely floating elements per page (cards on photos, primary CTA, overlays); keep peer-grid cards, dividers, body containers flat. Reach for typography weight, spacing, accent bars, subtle tints **before** shadow. Full rules in shared-standards.md §6.
### SVG File Naming Convention
Format: `<NN>_<page_name>.svg` (two-digit number from 01; name matches the deck's language and the page title in the Design Spec).
Examples: `01_封面.svg` / `02_目录.svg` / `03_核心优势.svg`; `01_cover.svg` / `02_agenda.svg` / `03_key_benefits.svg`.
---
## 4. Icon Usage
Strategist chooses the library and inventory; Executor only implements. Library details and one-library rule: [`../templates/icons/README.md`](../templates/icons/README.md). This section defines placeholder syntax.
> 🚧 **MANDATE — icons are not optional in free-design mode.** When `spec_lock.md` declares an `icons.library` + non-empty `inventory`, **every content page MUST place icons from that inventory** — they are part of the design, not garnish. In free-design mode there is no mirror template to copy icons from, so the only way icons reach the deck is you authoring `<use data-icon>` on each page. Concretely:
> - **Content pages** (KPI cards, lists, process / flow steps, comparison columns, feature grids, section dividers with a concept) → place **13** inventory icons that label the content (one per card / step / list item is the common pattern). A dense content page with zero icons reads flat and is a quality regression.
> - **Legitimately icon-less** (do NOT force icons): the cover, a pure-typography section break, a single-number / single-quote `breathing` page, and the closing/thanks page.
> - The strategist already validated each inventory name exists (via `icon_sync.py`); use those names verbatim — do not invent new ones.
> - **Enforcement**: `svg_quality_checker.py` fails the deck (hard error, non-zero exit) when an inventory is locked but the deck authors **zero** `<use data-icon>` across all pages, and warns per page that references none. Don't ship past it by deleting the icon lock — place the icons.
> **Resolution is project-first.** Strategist copied the chosen icons into `<project_path>/icons/<lib>/` (via `icon_sync.py`); `finalize_svg.py embed-icons` embeds from there, falling back to the global library per-icon. **Custom icons**: drop an `.svg` into `<project_path>/icons/<lib>/` (any `<lib>`, e.g. `custom/`) and reference it as `data-icon="<lib>/<name>"` — it embeds like any other. Reference only icons in the `spec_lock.md` inventory.
**Built-in icons — Placeholder method (recommended)**:
```xml
<!-- chunk-filled (straight-line geometry, sharp corners, structured) -->
<use data-icon="chunk-filled/home" x="100" y="200" width="48" height="48" fill="#005587"/>
<!-- tabler-filled (bezier-curve forms, smooth & rounded contours) -->
<use data-icon="tabler-filled/home" x="100" y="200" width="48" height="48" fill="#005587"/>
<!-- tabler-outline (light, line-art style — screen-only decks) -->
<use data-icon="tabler-outline/home" x="100" y="200" width="48" height="48" fill="#005587"/>
<!-- phosphor-duotone (single color + 20% backplate — soft depth without solid weight) -->
<use data-icon="phosphor-duotone/house" x="100" y="200" width="48" height="48" fill="#005587"/>
<!-- simple-icons (brand logos — used alongside the deck's primary library, only for real company/product marks) -->
<use data-icon="simple-icons/github" x="100" y="200" width="48" height="48" fill="#181717"/>
<!-- tabler-outline with thin / bold stroke (stroke-style libraries only) -->
<use data-icon="tabler-outline/home" x="100" y="200" width="48" height="48" fill="#005587" stroke-width="1.5"/>
<use data-icon="tabler-outline/home" x="100" y="200" width="48" height="48" fill="#005587" stroke-width="3"/>
```
> ⚠️ **Color**: ALWAYS use `fill="#HEX"` on `<use data-icon="...">`. NEVER use `stroke` or `fill="none"`, even for stroke-style libraries.
>
> **stroke-width** (stroke-style libraries only, currently `tabler-outline`): allowed values `{1.5, 2, 3}`. If `spec_lock.md icons.stroke_width` is declared, all placeholders MUST use that value deck-wide. Default `2` if absent (legacy). Ignored on non-stroke libraries.
>
> Icons are auto-embedded by `finalize_svg.py` — no need to run `embed_icons.py` manually.
**Searching for icons** — use terminal, zero token cost:
```bash
ls skills/ppt/templates/icons/chunk-filled/ | grep home
ls skills/ppt/templates/icons/tabler-filled/ | grep home
ls skills/ppt/templates/icons/tabler-outline/ | grep chart
ls skills/ppt/templates/icons/phosphor-duotone/ | grep house
ls skills/ppt/templates/icons/simple-icons/ | grep github
```
**Abstract concept → icon name** (names for `chunk-filled`; tabler libraries use their own equivalents — verify with `ls | grep`):
| Concept | chunk-filled | tabler-filled / tabler-outline |
|---------|-------|-------------------------------|
| Growth / Increase | `arrow-trend-up` | same |
| Decline / Decrease | `arrow-trend-down` | same |
| Success / Complete | `circle-checkmark` | `circle-check` |
| Warning / Risk | `triangle-exclamation` | `alert-triangle` |
| Innovation / Idea | `lightbulb` | `bulb` |
| Strategy / Goal | `target` | same |
| Efficiency / Speed | `bolt` | same |
| Collaboration / Team | `users` | same |
| Settings / Config | `cog` | `settings` |
| Security / Trust | `shield` | same |
| Money / Finance | `dollar` | `currency-dollar` |
| Time / Deadline | `clock` | same |
| Location / Region | `map-pin` | same |
| Communication | `comment` | `message` |
| Analysis / Data | `chart-bar` | same |
| Process / Flow | `arrows-rotate-clockwise` | `refresh` |
| Global / World | `globe` | `world` |
| Excellence / Award | `star` | same |
| Expand / Scale | `maximize` | same |
| Problem / Issue | `bug` | same |
> For self-evident names (home, user, file, search, arrow, etc.) — just `grep chunk-filled/` directly without consulting the table.
> ⚠️ **Icon validation**: only use icons from the Design Spec's approved inventory. Verify each via `ls | grep` before use. Mixing libraries within one deck is FORBIDDEN.
---
## 5. Visualization Reference
Chart SVGs referenced in **VII. Visualization Reference List** are loaded once via the §1.0 batch read. This section governs adaptation only.
**Hard rule**: adapt the loaded chart SVG; do not improvise from memory and do not replicate verbatim. Apply project colors, typography, content; preserve visualization type.
**Adaptation rules**:
- **Preserve**: visualization type (bar/line/pie/timeline/process/framework…) as specified
- **Adapt**: data, labels, colors (project scheme), dimensions
- **Freely adjust**: composition, axis ranges, grid, legend, spacing, decoration — as long as the chart stays accurate and readable
- **Forbidden**: changing visualization type without spec justification; omitting data points or structural elements from the outline
> Templates: `templates/charts/` (70 types). Index: `templates/charts/charts_index.json`
### 5.1 Chart Coordinate Calibration
Coordinate calibration runs as a **standalone post-generation workflow**, not inside the executor pipeline. After SVG generation completes, if the deck contains data charts, run [`workflows/verify-charts.md`](../workflows/verify-charts.md) before post-processing.
The executor's only obligation here is upstream: embed the `<!-- chart-plot-area ... -->` marker on every chart page during initial draft (§3.1). Verify-charts enumerates chart pages from `design_spec.md §VII` (authoritative deck plan) and uses the marker to feed `svg_position_calculator.py`.
> Do NOT run `svg_position_calculator.py` during the initial draft. The calculator calibrates already-generated SVGs against their declared plot areas; running it before the SVG exists has nothing to compare against.
---
## 6. Image Handling
Handle images by their status in the Design Spec's Image Resource List. Status enum and lifecycle: [`svg-image-embedding.md`](svg-image-embedding.md).
| Status | Source | Handling |
|--------|--------|----------|
| **Existing** | User-provided | Reference images directly from `../images/` directory |
| **Generated** | Generated by Image_Generator | Reference images directly from `../images/` directory |
| **Sourced** | Web-acquired by Image_Searcher | Reference from `../images/`. **Read [`image_sources.json`](image-searcher.md) to decide attribution** — see §6.1 below. |
| **Rendered** | Deterministic formula PNG | Reference from `../images/`; use `preserveAspectRatio="xMidYMid meet"` |
| **Needs-Manual** | Acquisition failed and file is absent | Use dashed border placeholder unless the expected file exists |
| **Placeholder** | Not yet prepared | Use dashed border placeholder |
**Reference syntax**: see [`svg-image-embedding.md`](svg-image-embedding.md).
**Template-bundled images**: when a template (deck / layout / brand) is applied, its bitmaps are copied into the project's `images/` alongside every other runtime image (SKILL.md Step 3). Reference them the same way — `../images/<name>` — and do **not** reproduce a template SVG's bare sibling href (e.g. `href="cover_bg.png"`): the template SVG is reference material, the rendered page lives in `svg_output/` and must point at `../images/`. Mirror templates (§1.1) are the one exception — they copy hrefs verbatim, and the exporter resolves those bare hrefs against `images/`.
**Placeholder**: Dashed border `<rect stroke-dasharray="8,4" .../>` + description text
**`no-crop` images**: when a `spec_lock.md images` entry ends with ` | no-crop`, size the container to the image's native ratio (from `analyze_images.py` or file dims) and use `preserveAspectRatio="xMidYMid meet"`. Untagged entries are croppable — default to `slice`.
**Formula images**: rows with `Acquire Via: formula` or `Type: Latex Formula` MUST be treated as no-crop even if a legacy `spec_lock.md` forgot the flag. Use the dimensions from `design_spec.md §VIII`, `analysis/image_analysis.csv`, or `images/formula_manifest.json`; do not normalize all formulas to one height unless the spec explicitly states that layout choice.
### 6.1 Inline Attribution for Sourced Images (web path)
Whenever the slide uses an image with `Status: Sourced`, look up the corresponding entry in `project/images/image_sources.json` and act on `license_tier`:
| `license_tier` | Action on this slide |
|---|---|
| `no-attribution` | Embed the `<image>` element only. **No credit element needed.** |
| `attribution-required` | Embed the `<image>` element **plus** a small inline `<text>` credit element per the visual spec in [image-searcher.md §7](./image-searcher.md). |
The credit text is **not** rendered by post-processing or export — it must be present in the SVG you produce. The shape of the credit element (size, position, color, multi-image source line, hero gradient overlay) is specified in [image-searcher.md §7](./image-searcher.md). Do not invent a different style.
Use `attribution_text` from the manifest entry as the **starting point**, then compress for the small-text constraint (drop URL, drop filename, keep "via Provider / License"). For CC0/PD images that landed in the `attribution-required` tier only because of upstream metadata quirks (rare), credits are still safe to render.
`svg_quality_checker.py` treats missing CC BY / CC BY-SA inline attribution as an **error**. Fix the offending SVG before post-processing.
**The manifest is the single source of truth for credits.** Do not duplicate license info into speaker notes or any other artifact.
---
## 7. Font Usage
Source of truth: `spec_lock.md typography`. Use `font_family` as default; override per role with `title_family` / `body_family` / `emphasis_family` / `code_family` if declared. LaTeX formulas that Strategist rendered are PNG images, not a `code_family` text role.
If `spec_lock.md` is absent, consult [`strategist.md`](strategist.md) §g — do not invent a stack.
**Hard rule**: every SVG `font-family` stack MUST end with a pre-installed family (Microsoft YaHei / SimHei / SimSun / Arial / Calibri / Segoe UI / Times New Roman / Georgia / Consolas / Courier New / Impact / Arial Black). PPTX has no runtime fallback — missing fonts degrade to Calibri.
---
## 8. Speaker Notes Generation Framework
### Task 1. Generate Complete Speaker Notes Document
After all SVG pages are finalized, enter Logic Construction Phase and write the full notes to `notes/total.md`. Batch-writing (not per-page) lets transitions plan coherently.
**Pure spoken narration**: notes are read aloud verbatim by `notes_to_audio.py` (TTS). Write only what should be spoken. No visible markers, no labeled meta-lines, no enumerated key-point lists, no duration annotations — anything you write outside the heading will be vocalized.
**Per-page structure**: `# <number>_<page_title>` heading (the `#` heading line is the only thing stripped before TTS), pages separated by `---`. Body is 25 natural sentences carrying the page's core message. Page-to-page transitions live inside the opening sentence as natural prose ("接下来……" / "Having framed X, let's turn to Y") — no bracketed `[过渡]` / `[Transition]` tags.
**Concrete examples** — same shape applies to any language; just write naturally in that language.
中文 deck
```
# 02_市场格局
在明确了行业背景之后,我们来看具体的市场格局。当前线上零售集中度持续上升,前三大平台合计份额已经达到百分之六十八,腰部玩家正在被快速挤压,留给新进入者的窗口期不超过十八个月。这意味着我们的策略必须聚焦,而不是铺开。
```
英文 deck
```
# 02_market_landscape
Having framed the industry backdrop, let's look at the actual market landscape. Online retail concentration keeps rising — the top three platforms now hold sixty-eight percent of combined share, mid-tier players are being squeezed fast, and the window for new entrants is under eighteen months. This means our strategy has to focus, not spread.
```
> 日本語 / 한국어 / 其他语言:照搬同样的结构,用对应语言自然书写即可。
**Number readability**: TTS reads digits and symbols literally. Prefer fully-spelled forms in the language being spoken when literal pronunciation would be awkward (e.g. Chinese "百分之六十八" reads better than "68%"; "1-2分钟" reads as "一减二分钟"). Plain integers and percentages in English are fine as-is.
**Common mistakes to avoid**:
- Leaving any bracketed stage marker (`[过渡]` / `[Transition]` / `[Pause]` / `[Data]` / `[Scan Room]` / `[Interactive]` / `[Benchmark]` etc.) in the text — they will be read aloud literally.
- Adding `要点:① …` / `Key points: (1) …` / `时长2分钟` / `Duration: 2 minutes` / `Flex: …` lines — TTS will speak "要点 一 …".
- Mixing languages within one deck's notes.
### Task 2. Split Into Per-Page Note Files
Auto-split `notes/total.md` into per-page files in `notes/`.
**Naming**: match SVG names (`01_cover.svg` → `notes/01_cover.md`); `slide01.md` also supported (legacy).
---
## 9. Next Steps After Completion
> **Auto-continuation**: After Visual Construction Phase (all SVG pages) and Logic Construction Phase (all notes) are complete, the Executor proceeds directly to the post-processing pipeline.
**Post-processing & Export** (same canonical pipeline as [shared-standards.md §5](shared-standards.md)):
```bash
# 1. Split speaker notes
python3 scripts/total_md_split.py <project_path>
# 2. SVG post-processing (auto-embed icons, images, etc.)
python3 scripts/finalize_svg.py <project_path>
# 3. Export PPTX
python3 scripts/svg_to_pptx.py <project_path>
# Output (default-flow mode):
# exports/<project_name>_<timestamp>.pptx ← native pptx (canonical output)
# backup/<timestamp>/svg_output/ ← Executor SVG source backup (always written)
#
# Add --svg-snapshot to additionally emit:
# exports/<project_name>_<timestamp>_svg.pptx ← SVG snapshot pptx (sibling of native pptx)
```

View File

@ -1,102 +0,0 @@
# 图标系统 (两层)
> 几何装饰 (圆点、徽章、品牌条、装饰线) 已在 `layouts.md` 起手块以 helper 封装 (`add_dot` / `add_badge` / `add_accent_line` / `add_rect`),直接调用,**不要重写**,**也不要把它们当"图标"用**。本文档处理的是真正的**业务概念图标** (火箭 / 目标 / 雷达 / 齿轮 / 盾牌 ...)
## 选图标两层降级
```
1) Iconify 个性化图标 ── 业务概念 (火箭、目标、雷达、齿轮) → 见 §A
2) Unicode 字形兜底 ── Iconify 没有合适的 (✓ ✗ ★ → ↑) → 见 §B
```
整 deck 选**一个图标集**用到底,不要 tabler 跟 lucide 混用。
## §A. Iconify 个性化图标 (本地缓存 + 网络拉取)
### A1. 本地库 (两处:只读种子库 + 本 task 已拉)
- **种子库(只读)**: `<skill_dir>/assets/icons/` —— skill 自带的商务红 tabler 种子集,详见 [INDEX.md](../assets/icons/INDEX.md)。docker 沙盒里 `skills/` 是只读挂载,**只能读、不能往这儿写**。
- **本 task 已拉**: `<task_dir>/assets/icons/` —— A2 fetch 新图标的落点(可写)。
命名规约: `<set>_<name>_<colorhex>_<sizepx>.png`(如 `tabler_rocket_C00000_128.png`)
**用之前先 `glob` 两处都查一遍**(种子库 `<skill_dir>/assets/icons/` + 本 task `<task_dir>/assets/icons/`),有就直接 `add_picture`,免去网络往返。
### A2. fetch_icon.py 拉新图标
脚本在 `<skill_dir>/scripts/`(只读可执行);拉下来的图标 `-o` **必须落 `<task_dir>/assets/icons/`**(种子库只读,新图标进 task 目录):
```bash
# 主红色 128px PNG (推荐)
python <skill_dir>/scripts/fetch_icon.py rocket --set tabler --color C00000 \
--size 128 -o <task_dir>/assets/icons/tabler_rocket_C00000_128.png
# 强调色金黄
python <skill_dir>/scripts/fetch_icon.py target --set tabler --color FFC107 \
--size 128 -o <task_dir>/assets/icons/tabler_target_FFC107_128.png
```
`--set` 默认 `tabler`(4500+ 商务图标,MIT)。其它选 `lucide / heroicons / material-symbols / carbon / fluent / mdi`。**整 deck 只用一个 set**。
PNG 转换需 `pip install cairosvg`(推荐)或 `pip install svglib`。没装也能拿 SVG。
### A3. 嵌入幻灯片
```python
slide.shapes.add_picture(
"<task_dir>/assets/icons/tabler_rocket_C00000_128.png", # 路径 = glob 命中的那处(种子库或 task)
Inches(1.0), Inches(2.5),
width=Inches(0.8), # 装饰图标 0.5-1.5 in;别超 2 in
)
```
### A4. 浏览找名字
打开 https://icon-sets.iconify.design/ 搜关键词,如 "rocket" / "数据" / "shield",拿到名字 (如 `tabler:rocket`) 直接给 fetch_icon.py。
### A5. 流程节点 (替代 PENTAGON)
需要"调研→设计→开发→测试→上线"这种横向流程时,**不要用 PowerPoint 内置 PENTAGON**(视觉陈旧),改用 Iconify 的 `chevron-right` + 文本组合:
```python
from pptx.util import Inches
from pptx.enum.text import PP_ALIGN
# 假设页面顶部已 import pptx_helpers as P,且 slide 已建(见 layouts.md §通用起手)
stages = ["调研","设计","开发","测试","上线"]
icon_path = "<task_dir>/assets/icons/tabler_chevron-right_C00000_64.png" # 先 fetch_icon.py 拉到 task,种子库没有 chevron-right_64
for i, label in enumerate(stages):
x = 0.7 + i * 2.4
P.add_textbox(slide, x, 3.7, 1.8, 0.5, label, 16, bold=True,
color=P.PRIMARY, align=PP_ALIGN.CENTER, name=f"stage_{i}")
if i < len(stages) - 1: # 节点间放 chevron
slide.shapes.add_picture(icon_path, Inches(x + 1.85), Inches(3.7),
width=Inches(0.4))
```
## §B. Unicode 字形 (兜底)
Iconify 都没合适的时候用。避 emoji,用单色符号:
```
✓ ✔ ✗ ✘ 对号 / 错号
★ ✦ ✧ ✪ 星
→ ← ↑ ↓ ↔ 箭头
↗ ↘ ↙ ↖ 斜箭头
● ○ ◉ ◎ 圆
⬛ ⬜ ◆ ◇ 方块菱形
∴ ∵ ⇒ ⇔ 数学
№ ¶ § † 文档
```
```bash
# 强调色对号 96px → PNG
python <skill_dir>/scripts/render_icon.py "✓" --color "#C00000" --size 96 -o <task_dir>/slides/check.png
```
## §C. 硬规则
1. **风格统一** —— 整 deck 只用一个 Iconify set;不要 tabler 跟 lucide 混
2. **颜色限定** —— 只用 PRIMARY / SECONDARY / ACCENT / GREY,不要每图标独立配色
3. **大小克制** —— 装饰图标 0.5-1.5 in;不超过 2 in
4. **不替表意** —— 一个 ★ 不能代替"重点"两字
5. **避免 emoji** —— 跨系统渲染差异大,且自带颜色冲突主题
6. **不要每页都堆** —— 装饰是配角,文字是主角
7. **缓存复用** —— Iconify 拉的图标进 `<task_dir>/assets/icons/`,本 task 内再用直接读,不要重复请求(种子库 `<skill_dir>/assets/icons/` 只读,新图标不往那写)
## §D. 不要把 layouts.md helper 当"图标"
`add_dot` / `add_badge` / `add_accent_line` / `add_rect` 是几何**装饰**(品牌条、圆点 bullet、编号徽章、装饰短线),不是业务图标。它们底层是 MSO_SHAPE.OVAL/RECTANGLE,但模型不要直接调 MSO_SHAPE —— 全部走 layouts.md 的 helper 接口。

View File

@ -0,0 +1,222 @@
# Image-Text Layout Patterns
A vocabulary registry of ways images can be placed on a slide. The point of this file is to **expand the mental list of options** so that when you reach for an image layout, you do not default to the same three patterns (left/right, top/bottom, full-bleed cover).
Every entry has a name plus a short technical hint. Common techniques get a single line. Less obvious or easily forgotten techniques get a short paragraph — not a full tutorial, but enough that a model unfamiliar with the project can implement it without guessing. This is a registry, not a teaching document; no use-case prescriptions, no decision tables.
> **Numbers are stable identifiers, not sequence.** The file is split into **Part 1 — Primary Structures** (#1#19, #38#56) and **Part 2 — Modifier Layers** (#20#37, #57#72). Numbers jump within each Part because Primary structures were grouped first; existing references to `#38`, `#48`, etc. anywhere in the project still resolve correctly.
---
## Core Principle — Two Layers
Almost every pattern below is an instance of one underlying split:
> **The image carries atmosphere, world-building, emotional weight. Native SVG shapes carry information, data, editable text.**
This is the single most underused move in image-heavy decks. The default reflex is to place image and text in adjacent rectangles. The far more powerful move — especially for content-rich pages — is to let the image **be the canvas** (often full-bleed) and draw native vector elements (annotation cards, flow nodes, KPI tiles, leader lines, network diagrams, dashboards) directly on top.
Anything that must be editable, numerically accurate, contain Chinese, or be styled to the deck's exact palette belongs in the SVG layer regardless of what the image looks like underneath.
---
# Part 1 — Primary Structures
Pick one or more of these as the page's bones. Cross-primary combinations are encouraged (see Composition Guidance).
## Container Layouts (where the image sits)
1. **Full-bleed background with floating title**`<image x=0 y=0 width=1280 height=720 preserveAspectRatio="xMidYMid slice"/>` + scrim `<rect>` for legibility + overlay `<text>`.
2. **Left-third image + right text body**`<image x=0 y=0 width=~427 height=720>` on the left; text area in the remaining width; optional right-edge gradient fade for smooth transition.
3. **Right-third image + left text body** — mirror of #2.
4. **Right image bleeding off the canvas edge**`<image>` width extended past viewBox; text on left with a rightward gradient fade so the image emerges from the text area without a visible boundary.
5. **Top-band image + bottom multi-column text**`<image x=0 y=0 width=1280 height=~340>` at the top + bottom-fade gradient + 23 evenly spaced text columns below.
6. **Bottom-band image + top title + middle text** — mirror of #5 with the image at the bottom and a top-fade gradient.
7. **Top-and-bottom symmetric split** — image occupies 50% (top or bottom) with a divider line or thin gradient band separating the halves.
8. **Z-pattern serpentine** — three rows, image on the left in rows 1 and 3, on the right in row 2 (or alternating). Each row roughly 1/3 canvas height; visual flow zigzags down the page.
9. **3×3 grid with central image** — nine cells; center cell holds the image, the other 8 hold text blocks, color swatches, or small data widgets.
10. **Centered image with radial callouts pointing outward** — image (often circular via `clipPath`) at canvas center; multiple `<line>` leader lines + small `<circle>` endpoints + offset text labels in surrounding space.
11. **Diagonal split with directional gradient (not hard polygon cut)** — full-bleed `<image>` (do NOT hard-clip) + overlay `<rect fill="url(#grad)">` whose `<linearGradient>` axis runs along the desired diagonal + a `<line>` on the diagonal to make the divider visible. The gradient does the "splitting" softly; hard polygon clipping produces ugly stair-step edges on text panels.
12. **Faded image as backdrop with oversized overlay text**`<image>` + heavy semi-transparent `<rect fill="bg-color" fill-opacity="0.50.7">` over it + huge `<text>` (80120px) on top. Image becomes texture; text is the subject.
13. **Narrow vertical image strip + giant horizontal title**`<image x=0 y=0 width=200280 height=720>` + thick divider `<rect>` + large `<text>` (6090px) in the remaining width.
14. **Horizontal banner strip cutting through mid-section**`<image y=middle width=1280 height=200280>` with edge fades; text blocks above and below the band.
15. **Multi-image montage with bold text spanning across** — multiple `<image>` tiled with 24px gaps + large `<text>` (60100px) in a darkened band spanning the full montage. The band uses `<rect fill-opacity="0.50.7">` to keep text legible across all underlying images.
16. **Negative-space dominant — small image, mostly whitespace** — image and text together occupy less than 40% of the canvas; rest is empty.
17. **Picture-in-picture inset** — large `<image>` background + small `<image>` overlaid inside it with a `<rect>` frame.
18. **Image as full-height sidebar column** — narrow `<image x=0 y=0 width=~200280 height=720>`; rest of canvas is content area.
19. **Image floating in whitespace with thin frame and caption**`<image>` + thin `<rect fill="none" stroke="…">` frame around it + `<text>` caption below.
## Image-as-Canvas + Native Overlay (the most underused family)
This is the family that opens up the largest design space and the one AI is most likely to skip. The shared pattern: image fills the slide (or a large region), native SVG elements are layered on top to carry the actual information. None of the overlay elements need to be generated by the image model — they are vector primitives you draw yourself.
38. **Background image + annotation cards with bezier leader lines** — full-bleed `<image>` + 24 small info cards (`<rect rx>` + icon + title + one-line text) placed in the image's calm regions. From each card, draw a bezier `<path>` ending in a `marker-end` arrow that points to the specific object in the image being annotated. Card text and leader lines are editable; image is the scene.
39. **Background image + flow nodes drawn over the scene** — the image is a real or rendered scene (workshop, control room, landscape). On top, draw a dashed `<path>` route that traces a workflow through the scene, with numbered `<circle>` nodes at each stop. Each node = number + icon + label. The flow is fully editable; the image is atmosphere.
40. **Background image + floating KPI metric cards** — full-bleed image (often an operations photo) + dark scrim + multiple `<rect>` cards in negative-space regions. Each card = icon + small label + large metric number. Image gives context; cards give the data.
41. **Background image + measurement lines and module tags (engineering overlay)** — used on technical / blueprint / cross-section images. Draw measurement lines with end-caps (`<line>` + perpendicular ticks) spanning a feature, with a centered label box reading dimensions or part names. Add tagged callouts with `<rect>` + monospace text. Reads as engineering drawing markup.
42. **Background image + glassmorphism UI panels** — image is the visual world; on top, draw UI elements (semi-transparent panels, progress arcs, status badges, indicators). Panels use `fill-opacity="0.60.8"` + thin light-color strokes; arcs via `<path d="…A…">`. Looks like a live dashboard floating above the scene.
43. **Background image + native data chart on top** — AI image generation cannot produce accurate data charts. Solution: use an AI-generated dashboard image as **visual reference only** (clearly labeled as such in a caption), and draw the actual chart with native SVG primitives (`<line>` axes, `<path>` series, `<circle>` data points) directly on or next to it. Required marker if exporting: `<!-- chart-plot-area: x_min,y_min,x_max,y_max -->` inside the chart group.
44. **Background image + native network/architecture diagram** — same logic as #43 but for structural diagrams. Image provides atmosphere or visual anchor; the actual nodes, connections, and labels are SVG circles, lines, icons, and text — all editable.
45. **Background image + numbered hotspots with sidebar legend** — small numbered `<circle>` markers placed on the image at points of interest. A sidebar (left or right) lists "1. … 2. … 3. …" with corresponding descriptions.
46. **Background image + bordered "lens" rectangle highlighting a sub-region** — full-bleed image + a bordered `<rect fill="none" stroke="accent" stroke-width="3"/>` framing a sub-region + caption nearby. Frame draws the eye to one detail without occluding the surrounding context.
## Multi-Image Compositions
47. **Small multiples — 36 same-kind images in an evenly spaced row** — each in identical container, each with identical caption block underneath (title + one-line description). This is **not** a generic grid: the identical framing is itself the message — readers compare across panels because the structure is the same. Useful for style comparisons, time-series snapshots, product variations.
48. **Side-by-side comparison (before/after, A/B, then/now)** — two `<image>` of equal size in 50/50 split with thin divider `<line>` and "before" / "after" labels.
49. **Asymmetric collage** — one large `<image>` + 23 smaller `<image>` arranged around it; sizes vary, gaps consistent.
50. **Tiled grid (2×2, 2×3, 3×3) with equal cells**`cell_size = (canvas - total_gap) / cols`; consistent `gap=220px`.
51. **Mosaic** — irregular tile sizes packed together with or without thin gaps; each image clipped to its tile's rect.
52. **Image strip / filmstrip** — horizontal sequence of `<image>` elements with thin gaps; same height, varying widths allowed.
53. **Vertical image stack** — column of `<image>` aligned by width, shared annotations on one side.
54. **Overlapping image stack**`<image>` elements with overlapping `x/y` positions; each subsequent one in front (z-order by document order); often combined with slight rotation for layered photo-print look.
55. **Diptych split — two images abutting at 50/50** — vertical or horizontal split with optional thin divider `<line>`.
56. **Image triptych** — three independent `<image>` side-by-side, equal widths or 2:1:2 etc. (distinct from #26 baked-in triptych, where the three scenes are inside one image file).
---
# Part 2 — Modifier Layers
Stack any of these freely on top of a Primary structure. Multiple Modifiers per page is the expected case, not the exception.
## Non-rectangular Image Shapes
20. **Circular crop**`<clipPath><circle cx cy r/></clipPath>` referenced by `<image clip-path="url(#id)"/>`.
21. **Rounded rectangle crop**`<clipPath><rect rx ry/></clipPath>`; the `rx` value controls roundness.
22. **Ellipse / oval crop**`<clipPath><ellipse cx cy rx ry/></clipPath>`.
23. **Hexagonal / polygonal crop**`<clipPath><polygon points="x1,y1 x2,y2 …"/></clipPath>`; remember to keep all vertices inside the image's display rectangle.
24. **Custom path crop (blob, arrow, leaf, silhouette)**`<clipPath><path d="…"/></clipPath>`; allows any curved or organic shape. PowerPoint export translates this to `custGeom` and survives roundtrip.
25. **Layered paper-cut stack** — multiple image or shape layers each with `clipPath` + a small `<feDropShadow>` offset to fake physical layering depth. Each layer casts a shadow onto the next, producing real-looking craft depth.
26. **Triptych baked into a single wide image** — one wide `<image width=1160 height=334>` whose internal composition already contains 23 scenes. Generate the triptych as one image (not three separate calls) when scene-to-scene consistency matters — the model preserves character identity, lighting continuity, and color grading far more reliably when panels are produced together.
## Overlay & Masking Treatments
27. **Linear gradient mask for text legibility**`<linearGradient>` in `<defs>` (set `x1/y1/x2/y2` for direction) + overlay `<rect fill="url(#grad)">`. Most common is top-to-bottom darkening on full-bleed cover images.
28. **Radial gradient vignette**`<radialGradient cx cy r>` with dark outer stops; overlay `<rect>`. Focuses attention by darkening the periphery.
29. **Two-stop scrim — opaque on text side, transparent on focal side**`<linearGradient>` with one stop at `stop-opacity="0.9"` and another at `stop-opacity="0"`. Use when text sits on one side and the image's subject on the other.
30. **Flat semi-transparent rectangle overlay**`<rect fill="#000" fill-opacity="0.4"/>` over the image. Uniform darkening/lightening; simplest scrim.
31. **Color-tinted overlay**`<rect fill="#brandColor" fill-opacity="0.150.25"/>`. Pushes a foreign-looking image toward the deck's palette without regenerating it.
32. **Multi-stop scrim with hue shift** — three-or-more-stop `<linearGradient>` where stops are different colors (e.g. dark navy → transparent → warm orange). This re-grades the image's color world without regenerating — particularly useful when an AI image came back with the right composition but wrong color temperature.
33. **Spotlight mask — clear region surrounded by darkness** — cover the canvas with `<rect>` filled by a `<radialGradient>` whose inner stop is fully transparent and outer stop is opaque dark. Reads as a flashlight beam on the focal area. Use sparingly — it kills everything outside the spotlight.
34. **Gaussian-blur backdrop**`<filter><feGaussianBlur stdDeviation="815"/></filter>` applied to the background image, with sharp content layered on top unblurred. Reads as depth-of-field. Be aware that filters have inconsistent PPT export support — if fidelity matters, bake the blur into the source image instead.
35. **Duotone treatment** — two-color mapping of a photograph (e.g. deep navy shadows + warm cream highlights). Most reliable when baked into the source image at generation time. Runtime SVG duotone via `<feColorMatrix>` + `<feComponentTransfer>` is possible but the filter chain is fragile through PPT export — only attempt if you control the renderer.
36. **Drop shadow under image panel**`<filter><feDropShadow dx dy stdDeviation flood-color flood-opacity/></filter>` applied to the image's container `<rect>` (or to the `<image>` itself). Standard depth lift.
37. **Inner / outer glow on overlay shape**`<filter><feGaussianBlur/><feMerge/></filter>` on a shape, or simply a slightly larger blurred `<rect>` underneath the target.
## Image as Texture / Atmosphere
57. **Full-bleed image with extreme low opacity as texture wash** — full-bleed `<image>` + overlay `<rect fill="bg-color" fill-opacity="0.70.85"/>` so the image only barely shows through.
58. **Image fragment as decorative corner element** — small `<image>` (often with `clipPath`) placed in one corner; not the focus, just visual seasoning.
59. **Image as horizontal divider band** — narrow `<image height=80150>` placed between two text sections instead of a `<line>` divider.
60. **Image as ambient noise** — visible but low contrast; mood-setting only, not informational.
61. **Image as watermark behind body content** — large `<image>` at very low opacity behind body text. Use either a pre-baked low-alpha image or a high-opacity overlay `<rect>` to suppress visibility.
## Special Techniques
62. **Same image, two references — full view + zoom-callout** — reference the same image file twice in two `<image>` elements: one shows the full scene at normal size; the second uses `clipPath` (circle or rectangle) plus a larger display size to "zoom into" a sub-region. Connect them with a bezier `<path>` ending in `marker-end`; ring the zoom with a `<circle stroke>` so it reads as a magnifying lens. No special asset needed — the zoom effect comes from same-source-different-display.
63. **Transparent PNG sticker / cutout** — an RGBA PNG (with alpha channel) placed via standard `<image>` — no `clipPath` required, the transparency lives in the file itself. Useful for subjects that should not appear inside a rectangular frame (people cutouts, product shots, decorative motifs floating over backgrounds). Producing transparent PNGs is **not** a standard ppt-master pipeline step — three paths: (a) AI backend that supports transparent output natively, (b) generate a chroma-key (solid green background) image then strip the green with a separate tool, (c) user-supplied transparent asset. SVG-side usage is trivial; asset preparation is the work.
64. **Image with embedded text rendered by the AI** — text becomes part of the artwork: decorative lettering, designed title, hand-lettered keyword. Prompt with explicit text content — name the exact characters literally. Use for text that is part of the artwork and will not change. Anything that must be correct or editable goes in the SVG `<text>` layer (#65).
65. **Image with NO text — labels added as native SVG** — generate the image with explicit "no text, no letters, no numbers, no signs" instruction (`text_policy: none`), then place all labels as `<text>` overlays. The right call when labels will be reworded, must stay exact, or carry data that must stay editable — pair with `#64` when stable visual identifiers (axis labels, subplot letters, unit symbols) belong inside the image instead.
66. **Image fading into the solid background** — soften the image's edge into the deck's background color via a `<linearGradient>` overlay whose end-stop matches the background hex exactly. The image's rectangular boundary disappears, producing seamless integration.
67. **Image with knock-out / cut-out shape** — overlay a shape filled with the background color or another image, creating the impression of a hole punched through the underlying image.
68. **Text-as-mask over image** — letterforms revealing image through them. SVG-level `<mask>` is forbidden in this project (PPT export breaks). The only reliable way: bake this effect into the image at generation time by prompting for "large lettering revealing the underlying scene through letterforms." Treat as a pre-rendered artistic choice, not a runtime effect.
69. **Image rotated at a slight angle for editorial feel**`transform="rotate(angle cx cy)"` on the `<image>` or its container `<g>`; 26 degrees typical. Adds dynamism without breaking layout.
70. **Image with thin colored matte frame**`<rect fill="none" stroke="#color" stroke-width="26"/>` over or around the image edge. Single rule, single color.
71. **Image with multiple stacked frames for "photo print" aesthetic** — nested `<rect>` outlines or `<rect>` containers of slightly different sizes giving a "framed photograph" look.
72. **Image-to-image transition / merge** — two `<image>` elements with overlapping regions, one or both with gradient masks (from group C) creating a soft blend between them.
---
## Composition Guidance
A page is built by layering. Pick one or more **Primary Structures** (Part 1) as the page's bones, then add any number of **Modifier Layers** (Part 2) for finish. Both stack — the question on each page is "is the next layer still earning its place", not "have I exceeded a quota".
**Cross-primary combinations are encouraged.** A side-by-side comparison (#48) where each side is annotated with bezier-leader cards (#38) is one page, not a violation. A 3×3 grid (#9) whose center cell is upgraded to an image-as-canvas with KPI overlay (#40) reads as one composition. The old reflex "one primary per page" tends to under-use the catalog — combine when the page asks for it.
**Modifier stacking pattern that works in practice** — observed on real content pages combining one Primary with four Modifiers:
- one Primary from Part 1 (e.g. #48 side-by-side comparison)
- `#21` rounded-rectangle clipPath on the image (rx=6 or circle)
- `#27` top-edge linearGradient in the deck's accent color, opacity 0.55 → 0
- `#66` bottom-edge linearGradient fading to background color, opacity 0 → 0.95
- small color-block badge + reversed-out label replacing any opaque color bar that would otherwise sit over the image
Combine freely. The "AI-default" failure mode is the opposite: defaulting to bare #2 / #3 (left/right split) with no Modifier at all.
**Skip-detection signal** — if every page's `Layout pattern` column resolves to bare #2 / #3 / #5 / #6 with no Modifier ids, the catalog was not consulted. Re-read and reconsider.
## Hard Constraints
- Long body copy, data points, numeric labels, and Chinese text always go in the SVG layer — never baked into the image.
- `<clipPath>` on `<image>` and transparency encoding (`fill-opacity` / `stop-opacity`, never `rgba()`) — authoritative form in [`shared-standards.md`](shared-standards.md) §1.2 and §2; do not restate or relax here.
- No `<mask>`, no `<feComposite>` for alpha compositing. Alpha-effect routing (gradient overlays, clipPath crops, filter shadows, baked-in source image) is the table in [`shared-standards.md`](shared-standards.md) §1.0.
- `<feDropShadow>` / `<feGaussianBlur>` are accepted but PPT export is inconsistent — bake into the source image when fidelity is critical.
---
For sizing math (calculating container dimensions from image aspect ratio when using side-by-side intent), see [`image-layout-spec.md`](image-layout-spec.md). This file is the design vocabulary; that file is the dimension calculator.

View File

@ -0,0 +1,235 @@
> See shared-standards.md for common technical constraints.
# Image Layout Specification
Layout rules for pages where the image is placed **side-by-side with body text** as a container block. Strategist and Executor both follow these rules when the image's narrative intent is *side-by-side*.
**Core principle (side-by-side)**: compute container layout from the image's original aspect ratio so the image displays completely — no excess whitespace, no cropping.
> **Scope**: this spec applies to *side-by-side* intent only. Other intents (hero / full-bleed, atmosphere / background, accent / inline) use full-bleed placement where ratio alignment is not a constraint and cropping is expected — the ratio→split table below does NOT apply. See `references/strategist.md` §h for intent selection.
---
## Layout Decision Flow
```
1. Decide narrative intent (hero / atmosphere / side-by-side / accent) — see strategist.md §h
2. If intent = side-by-side: continue below. Otherwise: compose per narrative; this spec does not apply.
3. Get image original dimensions → Calculate ratio (width/height)
4. Select layout type based on ratio
5. Calculate maximum display size for the image
6. Allocate remaining space for text area
7. Fill results into the Design Specification's image resource list
```
**When to run**: if image approach includes "B) User-provided", run the scan and populate the image resource list after the Strategist's Eight Confirmations and before content analysis / outlining.
---
## Layout Type Selection (side-by-side intent)
| Image Ratio | Layout Type | Image Position | Description |
|-------------|-------------|----------------|-------------|
| > 2.0 (ultra-wide) | Top-bottom split | Top full-width | Image spans canvas width, height proportional |
| 1.5-2.0 (wide) | Top-bottom split | Top | Image width = content area width, height proportional |
| 1.2-1.5 (standard) | Left-right split | Left | Image height-first fit, width proportional |
| 0.8-1.2 (square) | Left-right split | Left | Image takes content area height, width proportional |
| < 0.8 (portrait) | Left-right split | Left | Image height = content area height, width proportional |
> Boundary ratio (e.g., 1.5): decide by text volume — more text → left-right; less text → top-bottom.
---
## Dimension Calculation Formulas
### Canvas Parameters (All Formats)
| Format | Canvas | Margins (L/R, T/B) | Content Area (W x H) | Title Height | Content Start Y |
|--------|--------|--------------------|-----------------------|-------------|----------------|
| PPT 16:9 | 1280x720 | 60, 60 | 1160 x 600 | 60px | 80px |
| PPT 4:3 | 1024x768 | 50, 50 | 924 x 608 | 60px | 70px |
| Xiaohongshu | 1242x1660 | 60, 80 | 1122 x 1500 | 80px | 100px |
| WeChat Moments | 1080x1080 | 60, 60 | 960 x 960 | 60px | 80px |
| Story | 1080x1920 | 60, 120/180 | 960 x 1620 | 80px | 140px |
| WeChat Article | 900x383 | 40, 40 | 820 x 303 | 40px | 50px |
> Below, **W** = content area width, **H** = content area height (excludes title). PPT 16:9 example: W=1160, H=600.
### Top-Bottom Layout Calculation
```
Image width = W = 1160 px
Image height = W / R = 1160 / R px
Text area height = H - image height - gap(20px)
Validation: Text area height >= 150px (at least 3-4 lines of text)
If not satisfied → Switch to left-right layout
```
### Left-Right Layout Calculation
**Method 1 (height-first, suitable for portrait images)**:
```
Image height = H = 600 px
Image width = H x R = 600 x R px
Text area width = W - image width - gap(20px)
```
**Method 2 (width-constrained, for wide images converted to left-right)**:
```
Image width = W x 0.7 = 812 px
Image height = image width / R
Text area width = W - image width - gap(20px)
```
**Validation**: Text area width >= 280px; otherwise reduce image area width.
---
## Layout Examples
### Ultra-wide Image (ratio 2.45)
```
Original: 1960x800, R=2.45 → Top-bottom split
Image: 1160x473, Text area: 1160x147 → 7:3 top-bottom
```
### Standard Landscape (ratio 1.38)
```
Original: 1614x1171, R=1.38 → Left-right split
Image: 773x560 (left), Text area: 367x560 (right) → 7:3 left-right
```
### Wide Image Edge Case (ratio 1.75)
```
Original: 1820x1040, R=1.75
Try top-bottom: image height=663, text area=-43 ❌
Switch to left-right: image 780x446 (left), text area 360x600 (right) → 7:3 left-right
```
---
## Portrait Canvas Override
Default selection table assumes **landscape or square canvas**. For portrait canvases (height > width), left-right splits leave both columns too narrow — use the override below.
| Canvas Orientation | Image Ratio | Recommended Layout | Reason |
|-------------------|-------------|-------------------|--------|
| Portrait (Xiaohongshu, Story) | > 1.5 (wide) | Top-bottom | Same as landscape canvas |
| Portrait (Xiaohongshu, Story) | 1.2-1.5 (standard) | Top-bottom | Left-right too narrow on tall canvas |
| Portrait (Xiaohongshu, Story) | 0.8-1.2 (square) | Top-bottom | Image fits well in top half |
| Portrait (Xiaohongshu, Story) | 0.5-0.8 (portrait) | Left-right | Portrait image on tall canvas works |
| Portrait (Xiaohongshu, Story) | < 0.5 (extreme portrait) | Left-right | Image takes one side, text the other |
> Square canvases (WeChat Moments 1:1): use the standard landscape rules.
---
## Multi-Image Layout
For slides with multiple images, divide the content area evenly using the formulas below.
### Grid Formulas
```
columns = number of columns
rows = number of rows
gap = 20px (PPT formats) or 30px (social formats)
cell_width = (W - (columns - 1) * gap) / columns
cell_height = (H - (rows - 1) * gap) / rows
```
### Common Patterns
| Image Count | Layout | Grid | Description |
|-------------|--------|------|-------------|
| 2 (both landscape) | Side-by-side | 2x1 | Two equal columns |
| 2 (both portrait) | Stacked | 1x2 | Two equal rows |
| 2 (mixed) | 1 large + 1 small | Custom | Landscape top (full-width), portrait right-bottom |
| 3 | 1 large + 2 small | 1+2 | Left large (50% width), right column with 2 stacked |
| 4 | Grid | 2x2 | Equal-sized cells |
### Example: 2x2 Grid on PPT 16:9
```
W=1160, H=600, gap=20
cell_width = (1160 - 20) / 2 = 570
cell_height = (600 - 20) / 2 = 290
Image positions:
(60, 80) 570x290 (650, 80) 570x290
(60, 390) 570x290 (650, 390) 570x290
```
> Multi-image slides: use `preserveAspectRatio="xMidYMid meet"` on all images for consistent in-cell display.
---
## Prohibited Practices
| Prohibited | Correct Approach |
|-----------|-----------------|
| Fixed 50:50 or arbitrary ratios | Dynamic calculation based on image ratio |
| Forcing wide image into square container | Use top-bottom layout or increase image area width |
| Placing portrait image in narrow horizontal strip | Use left-right layout, image on left |
| Image whitespace exceeding 10% | Recalculate layout or choose alternative approach |
| Cropping key image content | Use `preserveAspectRatio="xMidYMid meet"` |
| Text area too small to read | Ensure text area >= 150px (top-bottom) or >= 280px (left-right) |
---
## Handoff Fields
This spec only defines layout calculation. Write computed fields into the Image Resource List defined in [`svg-image-embedding.md`](svg-image-embedding.md):
| Field | Meaning |
|-------|---------|
| `Ratio` | Original image width / height |
| `Layout plan` | Top-bottom / left-right / grid, including split ratio when relevant |
| `Image area` | Computed display rectangle size |
| `Text area` | Computed remaining text area size |
For SVG `<image>` syntax, path rules, `preserveAspectRatio`, external refs, and Base64 embedding: see [`svg-image-embedding.md`](svg-image-embedding.md).
### SVG Image Embedding Examples
Complete display (data charts, side-by-side — must not crop):
```xml
<image href="../images/xxx.png"
x="60" y="80" width="780" height="446"
preserveAspectRatio="xMidYMid meet"/>
```
Crop-to-fill (backgrounds and hero images only):
```xml
<image href="../images/bg.png"
x="0" y="0" width="1280" height="720"
preserveAspectRatio="xMidYMid slice"/>
```
---
## Automation Tool
```bash
python3 scripts/analyze_images.py <project_path>/images # Default: PPT 16:9
python3 scripts/analyze_images.py <project_path>/images --canvas ppt43 # PPT 4:3
python3 scripts/analyze_images.py <project_path>/images --canvas xiaohongshu # Xiaohongshu
```
`--canvas` selects target format (default `ppt169`). The tool computes layout type (top-bottom / left-right), image display area, and text area per the formulas above. Output is a Markdown table — paste directly into the image resource list.
---
## Role Responsibilities
| Role | Responsibility |
|------|---------------|
| **Strategist** | Run analyze_images.py, calculate layout per this spec, populate image resource list |
| **Executor** | Strictly follow the layout plan and dimensions in the image resource list when generating SVGs |

View File

@ -1,532 +0,0 @@
# 版式库 (16:9, 13.33×7.5 in) — 卡片式视觉系统
> **要点**:版式 helper 全在 `scripts/pptx_helpers.py`,**不要把 helper 源码默写进 build_deck.py** —— 只 `import pptx_helpers as P` 然后调用。配色用 current spec(命名见 SKILL.md §阶段一)里的实际 hex,通过 `P.set_palette(spec_path=...)` 注入,默认商务红 + 自动派生明暗色阶。
>
> **观感升级要点(相对老版"左色条 + 圆点 bullet")**:内容尽量装进**圆角卡片**(`add_card`,自带柔和投影),业务概念配**图标底块**(`add_icon_tile`),数据页优先**KPI 数字卡**(`add_kpi`)而非小柱图,封面/章节用**渐变大色块**(`apply_brand` 已内置)。白底之上靠卡片浮起 + 浅色阶分层,才不是"扁平办公模板"。
## 通用起手(整 deck 单脚本 — 默认路径)
阶段二写一个 `build_deck.py`,一个进程内建完整份 deck、末尾 `save` 一次(**不逐页 run_python**)。每页一个小函数,主流程按逐页大纲依次调用:
```python
import sys
sys.path.insert(0, "<skill_dir>/scripts") # <skill_dir> 用 system prompt 注入的绝对路径替换
import pptx_helpers as P
from pptx.enum.text import MSO_ANCHOR, PP_ALIGN
from pptx.enum.shapes import MSO_SHAPE
SPEC = "<task_dir>/<today>-<task_short_id>-<task_name>.spec.md"
OUT = "<task_dir>/<topic>.pptx"
ICONS = "<task_dir>/assets/icons" # fetch_icon.py 拉到这;种子库在 <skill_dir>/assets/icons
def page_1_cover(prs):
s = P.add_slide(prs)
P.apply_brand(s, "cover")
# ... 见 L1 封面 ...
def page_2(prs):
s = P.add_slide(prs)
# ... 见对应 Lx 版式 ...
def main():
prs = P.new_presentation("16:9") # 默认 16:9;可传 "4:3" / "9:16" / "3:4"
P.set_palette(spec_path=SPEC) # 整 deck 设一次配色 + 派生色阶(同进程常驻)
for build in (page_1_cover, page_2, ...): # 按逐页大纲顺序
build(prs)
prs.save(OUT)
main()
```
跑法:先 `write` 脚本到 `<task_dir>/build_deck.py`,再 `run_python(script_path=...)`。要改(quality_check 报错 / 用户要调)→ 改对应 `page_x` 函数重跑整脚本(可复现,不 edit 成品 .pptx)。
> **风格探针 / 增量补页**:要先看封面 + 1 页观感,把 `main()` 循环临时缩到前 2 个函数跑一遍;或对已存在 deck 追加单页时 `prs = P.load(OUT)``add_slide`。**常规整建不用 `load`**。
⚠️ 一律用 `P.xxx`(不要 `from pptx_helpers import *`)—— `set_palette` 靠改模块属性覆盖配色,`import *` 会把旧绑定拷进命名空间导致覆盖不生效。
---
## Helper API 速查 (都在 `P.` 命名空间下)
**画布 / 配色入口**
- `P.new_presentation(canvas="16:9")` → 建空 deck,设画布,回填 `P.SLIDE_W/H` 与安全区
- `P.load(path)` → 载入已有 deck,按文件实际尺寸回填画布常量
- `P.add_slide(prs)` → 追加空白 slide
- `P.set_palette(primary=, secondary=, accent=, cn_font=, en_font=, spec_path=)` → 覆盖主题色/字体并**重算派生色阶**;传 `spec_path` 自动取 spec 前 3 个 #hex;默认商务红
**颜色常量**:`P.PRIMARY` `P.SECONDARY` `P.ACCENT` `P.INK` `P.GREY` `P.GREY_LIGHT` `P.HAIRLINE` `P.BG` `P.WHITE`
**派生色阶**(从主/辅/强调自动算):`P.PRIMARY_WASH`(整页/大区域浅底) `P.PRIMARY_SOFT`(卡片/标签浅底) `P.PRIMARY_DARK`(渐变深端) `P.ACCENT_SOFT`(高亮浅底) `P.SURFACE`(卡片白面)
**字体常量**:`P.CN_FONT`(微软雅黑) `P.EN_FONT`(Arial)
**画布常量**:`P.SLIDE_W` `P.SLIDE_H` `P.SAFE_LEFT/TOP/RIGHT/BOTTOM` `P.SAFE_W` `P.SAFE_H`
**色阶工具**:`P.tint(color, pct)` 提亮 / `P.shade(color, pct)` 压暗(自定义中间色用)
**🔥 组合版式件**(一个函数摆一整块 —— 优先用这些,别手摆参差网格/拿卡片硬凑时间线)
- `P.add_card_grid(slide, items, top, height, cols=None, icon_dir=None, accent=None)`**均衡概念网格**;items=每项 `{icon,title,body}`;自动均衡行列(2×2/2×3,不参差),单行图标顶置、多行图标左置;`icon_dir` 给图标目录(图标名去 `tabler_` 前缀)
- `P.add_timeline(slide, nodes, y=3.2)`**横向时间轴**;nodes=`{year,title,body}`;发展历程/路线图用,别塞卡片网格
- `P.add_cycle(slide, steps, cy=4.5, radius=1.55, center_label=)`**流程闭环**(节点沿环+中心词);循环类用。⚠️文字多时改用横向流程(L12)更稳
- `P.add_toc(slide, items, top=2.2)`**目录**(序号+标题+右副标+发丝线,贯通整宽);items=`(title, caption)`
- `P.add_kpi(slide, l, t, w, h, value, label, baseline=, delta=, delta_dir=)`**KPI 数字卡**;`baseline`=对比基准、`delta`=趋势(升绿降红);**数字别孤立**
- `P.add_takeaway(slide, "<一句话结论>", top=None)`**结论框**(浅主色底+左条);内容页论断标题下标配
- `P.add_source(slide, "<来源>")` → 数据来源(右下角弱化);含数据的页必标
- `P.add_picture_bg(slide, png)` → 整页铺渲染好的高清背景图(混合方案:背景图+原生可编辑文字)
**容器 / 质感**(卡片式核心)
- `P.add_card(slide, l, t, w, h, fill=SURFACE, radius=0.12, shadow=False, border=None, accent=None)` → 圆角卡片。**默认平卡**(白底描发丝边);**投影是克制**:平铺对等卡一律平,`shadow=True` 只给真悬浮/被挑出的卡,每页 ≤2-3 个;**一容器一手段**(投影/描边/底色/accent 四选一不叠)。见 design_principles §视觉深度
- `P.add_round_rect(slide, l, t, w, h, fill, radius=0.10)` → 无投影圆角矩形
- `P.add_gradient_rect(slide, l, t, w, h, c1, c2, angle=90, rounded=False)` → 渐变块(封面/章节大色块;原生可编辑非图片)
- `P.set_shadow(shape, ...)` / `P.set_line(shape, color, weight)` → 手动投影 / 描边
- `P.add_bg(slide, color=BG)` → 整页背景(`apply_brand` 已内置)
- 语义色:`P.GOOD`(增长绿)/ `P.BAD`(下降红)—— KPI 趋势用,不计三色制
**组件**
- `P.add_icon_tile(slide, x, y, size=0.9, png_path=None, fill=PRIMARY_SOFT)` → 图标圆角底块 + 居中图标
- `P.add_icon(slide, png_path, x, y, size=0.6)` → 裸图标 PNG(方形源等比)
- `P.add_pill(slide, x, y, w, h, text, fill=PRIMARY, fg=WHITE, size=12)` → 胶囊标签 / chip
- `P.add_eyebrow(slide, x, y, text, color=PRIMARY, size=13)` → 标题上方小标签 / kicker
- `P.add_badge(slide, x, y, num, diameter=0.7)` → 编号徽章(圆+数字)
- `P.add_chevron(slide, x, y, w=0.55, h=0.5, color=GREY_LIGHT)` → 流程箭头
- `P.add_dot(slide, x, y, size=0.18, color=ACCENT)` → 圆点(bullet 前缀)
- `P.add_accent_line(slide, x, y, length=1.0, thickness=0.05)` → 强调短线
- `P.add_divider(slide, x, y, length, vertical=False)` → 细分隔线
**文本 / 标题 / 品牌 / 备注**
- `P.add_textbox(slide, l, t, w, h, text, size, bold=False, color=INK, align=, anchor=, font=None, shrink=True, name=)` → 文本框;`font=None` 自动 latin=Arial + 东亚=微软雅黑(**中文真落雅黑靠这个**),传 `font` 则两槽都用它(纯英文大字/数字)
- `P.page_title(slide, text, page_num=None, total=None, footer=, eyebrow=None)` → 内页标题+强调线(+可选 eyebrow / 页脚页码)
- `P.apply_brand(slide, kind)` → 品牌锚点,`kind` ∈ `"cover"/"inner"/"section"/"end"`;**每页第一行必调**(已含整页背景)
- `P.add_notes(slide, text)` → 演讲者备注(正式产物每页给 2-4 句口述要点)
- `P.assert_inside(l, t, w, h, name="")` → 手动越界校验(放置 helper 已内置)
---
## 🔥 组合件示例 (优先用 —— 一个函数一整块)
### 内容页范式:论断标题 + Takeaway + 均衡网格
> 内容页的"黄金结构"(咨询级):**论断式标题**(写结论)→ **Takeaway 一句话**(浅底框)→ 内容。把它做成本地小函数 `content_header`
```python
from pptx.enum.text import MSO_ANCHOR
def content_header(s, title, takeaway, eyebrow=None):
ty = P.SAFE_TOP
if eyebrow:
P.add_eyebrow(s, P.SAFE_LEFT, ty, eyebrow); ty += 0.4
P.add_textbox(s, P.SAFE_LEFT, ty, P.SAFE_W, 0.7, title, 28, bold=True,
color=P.PRIMARY, name="title") # 论断标题
if takeaway:
P.add_takeaway(s, takeaway, top=ty + 0.82) # 结论框
s = P.add_slide(prs); P.apply_brand(s, "inner")
content_header(s, "大模型靠规模涌现出通用智能",
"参数突破千亿临界点后,模型从'专用工具'跃升为'通用大脑'",
eyebrow="DEFINITION")
items = [ # 每项 icon 名 + 标题 + 精炼正文(≤18 字)
{"icon": "brain", "title": "超大参数", "body": "千亿参数突破临界点,涌现推理力"},
{"icon": "cpu", "title": "对话生成", "body": "多轮对话、写代码、摘要改写"},
{"icon": "cloud-network", "title": "多模态", "body": "文本+图像+音频+视频统一理解"},
{"icon": "target", "title": "任务规划", "body": "高级推理与链式拆解"},
{"icon": "bolt", "title": "持续成长", "body": "RLHF、RAG、微调持续打磨"},
]
P.add_card_grid(s, items, top=2.35, height=4.5, icon_dir=ICONS) # 平卡,自动均衡
```
### 时间轴(发展历程 / 路线图)
```python
content_header(s, "六年从 GPT-1 到推理模型,能力指数跃迁",
"每一代都在重定义能力边界", eyebrow="TIMELINE")
P.add_timeline(s, [
{"year": "2018", "title": "GPT-1", "body": "预训练范式确立"},
{"year": "2020", "title": "GPT-3", "body": "1750 亿参数,few-shot 涌现"},
{"year": "2022", "title": "ChatGPT", "body": "对话式 AI 引爆全民应用"},
{"year": "2023", "title": "GPT-4", "body": "多模态 + 强推理"},
], y=3.9)
P.add_source(s, "OpenAI / 各厂商公开发布")
```
### KPI 数字卡(数据语境化:对比基准 + 升降)
```python
data = [("158%", "实验吞吐同比", "行业均值 90%", "+68pt", "up"),
("27天", "配方迭代周期", "去年 45 天", "-40%", "up"),
("92.3%", "中试一次通过率", "行业 81%", "+11pt", "up")]
n, gap = len(data), 0.3; cw = (P.SAFE_W - gap*(n-1))/n
for i,(v,lab,base,delta,d) in enumerate(data):
P.add_kpi(s, P.SAFE_LEFT+i*(cw+gap), 2.6, cw, 2.7, v, lab,
baseline=base, delta=delta, delta_dir=d)
```
### breathing 大字页(打破卡片单调 —— 每隔 2-3 页插一个)
```python
s = P.add_slide(prs); P.apply_brand(s, "inner")
P.add_eyebrow(s, P.SAFE_LEFT, 1.5, "THE INFLECTION POINT")
P.add_textbox(s, P.SAFE_LEFT, 2.15, 9.0, 2.5, "2 个月", 150, bold=True,
color=P.PRIMARY, font=P.EN_FONT, shrink=False, name="big_stat")
P.add_textbox(s, P.SAFE_LEFT, 4.7, 11, 0.7, "ChatGPT 月活突破 1 亿", 30,
bold=True, color=P.INK, name="big_label")
P.add_textbox(s, P.SAFE_LEFT, 5.6, 11, 0.6,
"史上最快 —— 此前纪录是 TikTok 的 9 个月", 18, color=P.GREY,
name="big_ctx") # 数据语境化:大数字必带对比
```
### 目录(贯通整宽)
```python
P.page_title(s, "目录", eyebrow="AGENDA")
P.add_toc(s, [("什么是大模型", "规模、能力与边界"),
("发展历程", "六年能力跃迁"),
("AI 智能体", "从对话到自主行动")], top=2.25)
```
### 混合背景封面(杂志级,opt-in)
```python
# 先 run_python: python render_bg.py --out <task_dir>/figures/cover_bg.png --kind cover --primary C00000
s = P.add_slide(prs)
P.add_picture_bg(s, "<task_dir>/figures/cover_bg.png") # 背景图(不可编辑)
P.add_eyebrow(s, 0.95, 1.95, "TECHNOLOGY INSIGHT · 2026", color=P.ACCENT)
P.add_textbox(s, 0.95, 2.45, 8.0, 1.7, "主标题\n副标题行", 44, bold=True,
color=P.WHITE, name="cover_title") # 白字叠背景(可编辑)
```
> 下面 L1-L13 是更细的手摆版式参考;**业务概念/数据/历程/循环优先用上面的组合件**,手摆只在组合件不覆盖时用。
> ⚠️ **给每个元素起语义 `name`**(`"bullet_1"`/`"kpi_val"`/`"eyebrow"`/`"pill"` 等)。quality_check 靠 name 判定"哪些是标签(小字号豁免)、哪些是真 bullet(计 ≤5)、谁压了谁",名字乱起会误报。helper 默认名已合理,自己加文本时照着命名。
> `MSO_SHAPE` / `PP_ALIGN` / `MSO_ANCHOR` 页面里要直接用就自行 import(`pptx_helpers` 内部已 import 但不重导出)。
---
## L1 · 封面 (Cover) —— 渐变大色块 + 左侧标题区
```python
s = P.add_slide(prs)
P.apply_brand(s, "cover") # 右侧 40% 主色→深主色渐变块 + 左上强调短线 + 底细线
# 左侧标题区(避开右侧渐变块,文字区约 7.4 寸宽)
P.add_eyebrow(s, 0.9, 2.0, "2026 年度技术汇报") # kicker 小标签
P.add_textbox(s, 0.9, 2.5, 7.2, 1.6, "项目名称 / 演示主题",
42, bold=True, color=P.INK, name="cover_title")
P.add_textbox(s, 0.9, 4.4, 7.0, 0.6, "一句话副标题或定位",
20, color=P.GREY, name="cover_sub")
P.add_textbox(s, 0.9, 6.4, 7.0, 0.4, "汇报人 · 部门 · 2026-06-08",
14, color=P.GREY_LIGHT, name="cover_meta")
P.add_notes(s, "开场白:点出主题与本次汇报要解决的核心问题。")
```
> 有合适主图时(见 SKILL.md §配图),可把右侧渐变块换成**真实图片**:`s.shapes.add_picture(hero, Inches(P.SLIDE_W*0.6), Inches(0), height=Inches(7.5))`,再在图上叠半透明主色块保证文字区干净。
---
## L2 · 目录 (Agenda) —— 编号徽章 + 文字
```python
from pptx.enum.text import MSO_ANCHOR
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.page_title(s, "目录")
items = ["背景与现状", "核心问题", "解决方案", "实施计划", "预期成果"]
for i, item in enumerate(items):
y = 1.9 + i * 0.95
P.add_badge(s, P.SAFE_LEFT, y, i + 1, diameter=0.65)
P.add_textbox(s, P.SAFE_LEFT + 1.0, y, P.SAFE_W - 1.0, 0.65, item, 22,
color=P.INK, anchor=MSO_ANCHOR.MIDDLE, name=f"agenda_{i}")
```
---
## L3 · 章节分隔 (Section Divider) —— 渐变整页 + 大字编号(白字)
```python
from pptx.enum.text import MSO_ANCHOR
s = P.add_slide(prs)
P.apply_brand(s, "section") # 主色→深主色整页渐变 + 强调装饰条
# 大编号(白色;font=EN_FONT 让数字走 Arial)
P.add_textbox(s, 1.1, 2.0, 4, 2.5, "01", 150, bold=True, color=P.WHITE,
font=P.EN_FONT, name="sec_num")
# 章节名(白色)
P.add_textbox(s, 5.3, 2.8, 7, 1.0, "背景与现状", 44, bold=True,
color=P.WHITE, anchor=MSO_ANCHOR.MIDDLE, name="sec_title")
# 引言(强调浅色,渐变深底上可读)
P.add_textbox(s, 5.3, 4.0, 7, 0.6, "本章讨论行业现状与机会窗口", 18,
color=P.ACCENT_SOFT, name="sec_lead")
```
> 渐变深底上文字一律用 **白 / `ACCENT_SOFT`** 等浅色,不要用 `INK` 深灰(看不清)。
---
## L4 · 要点 (Bullets) —— 圆点 + 文字;≥3 条建议升级成卡片(见 L11)
```python
from pptx.enum.text import MSO_ANCHOR
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.page_title(s, "核心结论")
bullets = [
"结论一:用一句话讲清楚",
"结论二:具体数据支撑,如增长 27%",
"结论三:对未来的判断,简洁有力",
"结论四:可选第四条,不要超过 5 条",
]
for i, b in enumerate(bullets):
y = 2.0 + i * 0.95
P.add_dot(s, P.SAFE_LEFT + 0.05, y + 0.22, size=0.18)
P.add_textbox(s, P.SAFE_LEFT + 0.45, y, P.SAFE_W - 0.45, 0.6, b, 22,
color=P.INK, anchor=MSO_ANCHOR.MIDDLE, name=f"bullet_{i}")
```
> 纯圆点 bullet 偏单薄。**业务概念类要点(能力/模块/策略)优先用 L11 卡片网格 + 图标底块**,视觉重量足。
---
## L5 · 双栏对比 (Two-Column) —— 两张卡片,左中右灰
```python
from pptx.enum.text import PP_ALIGN, MSO_ANCHOR
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.page_title(s, "现状 vs 改进后")
cw = (P.SAFE_W - 0.5) / 2 # 两卡 + 中间 0.5 间隙
ly, lh = 2.0, 4.5
# 左卡:现状(中性灰底,弱化)
P.add_card(s, P.SAFE_LEFT, ly, cw, lh, fill=P.BG, border=True, shadow=False)
P.add_pill(s, P.SAFE_LEFT + 0.35, ly + 0.35, 1.1, 0.36, "现状", fill=P.GREY)
left_pts = ["问题 A:描述", "问题 B:描述", "问题 C:描述"]
for i, p in enumerate(left_pts):
yy = ly + 1.1 + i * 0.7
P.add_dot(s, P.SAFE_LEFT + 0.4, yy + 0.16, color=P.GREY)
P.add_textbox(s, P.SAFE_LEFT + 0.8, yy, cw - 1.1, 0.55, p, 17,
color=P.INK, anchor=MSO_ANCHOR.MIDDLE, name=f"l_pt_{i}")
# 右卡:改进后(主色强调条 + 浅底,突出)
rx = P.SAFE_LEFT + cw + 0.5
P.add_card(s, rx, ly, cw, lh, fill=P.SURFACE, accent=P.PRIMARY)
P.add_pill(s, rx + 0.5, ly + 0.35, 1.3, 0.36, "改进后", fill=P.PRIMARY)
right_pts = ["改善 A:描述", "改善 B:描述", "改善 C:描述"]
for i, p in enumerate(right_pts):
yy = ly + 1.1 + i * 0.7
P.add_dot(s, rx + 0.55, yy + 0.16, color=P.ACCENT)
P.add_textbox(s, rx + 0.95, yy, cw - 1.3, 0.55, p, 17, color=P.INK,
anchor=MSO_ANCHOR.MIDDLE, name=f"r_pt_{i}")
```
---
## L6 · 图表为主 (Chart-focus) —— 标题 + 一句结论 + 大图嵌卡片
```python
from pptx.util import Inches
from pptx.enum.text import PP_ALIGN
# chart.png 已用 matplotlib 生成(见 design_principles.md §7)
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.page_title(s, "季度营收持续增长")
P.add_textbox(s, P.SAFE_LEFT, P.SAFE_TOP + 1.1, P.SAFE_W, 0.5,
"Q4 同比增长 158%,创历史新高", 18, color=P.GREY, name="lead")
# 图表衬一张白卡片(浮起,比裸图精致)
P.add_card(s, 2.0, 2.4, 9.3, 4.3, fill=P.SURFACE)
s.shapes.add_picture("<task_dir>/slides/chart.png", Inches(2.4),
Inches(2.7), width=Inches(8.5))
P.add_textbox(s, P.SAFE_LEFT, 6.95, P.SAFE_W, 0.4, "数据来源:公司年报 2025",
11, color=P.GREY_LIGHT, align=PP_ALIGN.RIGHT, shrink=False,
name="source")
```
---
## L7 · 图片为主 (Image-focus) —— 图占 58%,文字独立区
```python
from pptx.util import Inches
from pptx.enum.shapes import MSO_SHAPE
s = P.add_slide(prs)
P.add_bg(s, P.WHITE)
# 左侧图(只给 height 等比铺满,避免变形)
s.shapes.add_picture("<task_dir>/slides/hero.jpg", Inches(0), Inches(0),
height=Inches(7.5))
# 右侧浅底文字区
P.add_rect(s, 7.7, 0, 5.63, 7.5, P.PRIMARY_WASH, "text_panel")
P.add_eyebrow(s, 8.1, 1.4, "PRODUCT")
P.add_textbox(s, 8.1, 1.9, 4.9, 1.0, "走进未来", 36, bold=True, color=P.INK,
name="img_title")
P.add_accent_line(s, 8.1, 3.0, length=0.6)
P.add_textbox(s, 8.1, 3.4, 4.9, 1.6, "用一两句话点出主旨,不要把演讲稿搬上来。",
18, color=P.GREY, name="img_caption")
P.add_shape(s, MSO_SHAPE.RIGHT_ARROW, 8.1, 6.4, 0.7, 0.35, P.ACCENT, "img_cta")
```
---
## L8 · 金句 / 大字 (Quote) —— 留白主导
```python
from pptx.enum.text import MSO_ANCHOR
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.add_textbox(s, 0.8, 0.6, 1.5, 1.5, '"', 200, bold=True, color=P.ACCENT,
font=P.EN_FONT, shrink=False, name="quote_mark")
P.add_textbox(s, 1.5, 2.7, 10.5, 2.0, "把复杂留给我们,把简单留给用户。", 36,
bold=True, color=P.INK, anchor=MSO_ANCHOR.MIDDLE, name="quote_text")
P.add_accent_line(s, 1.5, 5.0, length=0.5)
P.add_textbox(s, 1.5, 5.2, 10.5, 0.5, "—— 公司价值观 2025", 16, color=P.GREY,
name="quote_attr")
```
---
## L9 · 结尾 / Q&A —— 浅底 + 大字,**强制必有**
> **不是可选** —— 任何 deck 都必须以这页收尾。
```python
from pptx.enum.text import PP_ALIGN
s = P.add_slide(prs)
P.apply_brand(s, "end") # PRIMARY_WASH 浅底 + 顶/底强调短线
P.add_textbox(s, 0, 2.5, P.SLIDE_W, 1.6, "Thank You", 80, bold=True,
color=P.PRIMARY, align=PP_ALIGN.CENTER, font=P.EN_FONT,
name="thanks")
P.add_textbox(s, 0, 4.3, P.SLIDE_W, 0.6, "欢迎提问与讨论", 22, color=P.INK,
align=PP_ALIGN.CENTER, name="qa")
P.add_textbox(s, 0, 6.2, P.SLIDE_W, 0.5, "联系方式 / 邮箱 / 公众号", 14,
color=P.GREY_LIGHT, align=PP_ALIGN.CENTER, name="contact")
```
---
## L10 · KPI 数字卡 (Metrics) —— 2-4 张并排,数据页主力
> 数据页**优先用这个**,不要为 2-4 个数字硬画柱状图。大数字 + 标签 + 同比小注,信息密度与质感俱佳。
```python
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.page_title(s, "平台运行关键指标", eyebrow="运行数据 / 2025")
data = [("158%", "实验吞吐同比", "↑ 较去年"),
("27天", "配方迭代周期", "↓ 缩短 40%"),
("92.3%", "中试一次通过率", "↑ +11pt"),
("4.2万", "累计实验记录", "条")]
n = len(data)
gap = 0.3
cw = (P.SAFE_W - gap * (n - 1)) / n
for i, (v, lab, sub) in enumerate(data):
P.add_kpi(s, P.SAFE_LEFT + i * (cw + gap), 2.6, cw, 2.7, v, lab, sub=sub)
```
> 想突出某张卡:传 `value_color=P.ACCENT` 或给那张卡 `add_card(..., accent=P.ACCENT)``add_kpi(..., card=False)` 叠上。
---
## L11 · 卡片网格 (Card Grid) —— 图标底块 + 标题 + 说明,业务概念主力
> 能力 / 模块 / 策略 / 价值点这类**业务概念**用它,替代单薄的圆点 bullet。2-4 列均可;图标走 `add_icon_tile`(图标先按 SKILL.md §阶段二第 2 步批量 `fetch_icon.py` 拉到 `<task_dir>/assets/icons`)。
```python
import os
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.page_title(s, "三大核心能力")
items = [("target", "数据底座", "统一实验/表征/工艺数据湖,一处录入处处可用"),
("cpu", "智能配方", "贝叶斯优化叠加机理约束,迭代更快更稳"),
("chart-bar", "中试放大", "小试到中试参数迁移模型,放大不失真")]
n = len(items)
gap = 0.35
cw = (P.SAFE_W - gap * (n - 1)) / n
for i, (icon, h, body) in enumerate(items):
x = P.SAFE_LEFT + i * (cw + gap)
P.add_card(s, x, 2.3, cw, 3.6, accent=P.PRIMARY)
png = os.path.join(ICONS, f"tabler_{icon}_C00000_128.png") # 主色染色后的图标
P.add_icon_tile(s, x + 0.4, 2.7, 0.95, png_path=png)
P.add_textbox(s, x + 0.4, 3.85, cw - 0.8, 0.5, h, 20, bold=True,
color=P.INK, name=f"card_h_{i}")
P.add_textbox(s, x + 0.4, 4.45, cw - 0.8, 1.1, body, 15, color=P.GREY,
name=f"card_b_{i}")
```
---
## L12 · 流程 / 步骤 (Process) —— 卡片 + chevron 箭头串联
```python
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.page_title(s, "实施四步走", eyebrow="路线图")
steps = [("01", "调研", "梳理现状与痛点"),
("02", "建模", "搭数据底座与模型"),
("03", "试点", "单产线小批验证"),
("04", "推广", "全厂复制与运维")]
n = len(steps)
arrow_w = 0.5
cw = (P.SAFE_W - arrow_w * (n - 1) - 0.2 * (n - 1)) / n
y, h = 2.8, 2.6
for i, (num, title, body) in enumerate(steps):
x = P.SAFE_LEFT + i * (cw + arrow_w + 0.2)
P.add_card(s, x, y, cw, h, fill=P.SURFACE)
P.add_textbox(s, x + 0.3, y + 0.3, cw - 0.6, 0.7, num, 34, bold=True,
color=P.PRIMARY, font=P.EN_FONT, name=f"step_num_{i}")
P.add_textbox(s, x + 0.3, y + 1.1, cw - 0.6, 0.5, title, 19, bold=True,
color=P.INK, name=f"step_t_{i}")
P.add_textbox(s, x + 0.3, y + 1.65, cw - 0.6, 0.8, body, 14, color=P.GREY,
name=f"step_b_{i}")
if i < n - 1:
P.add_chevron(s, x + cw + 0.1, y + h / 2 - 0.25, arrow_w, 0.5)
```
---
## L13 · 大数字 + 论据 (Stat Highlight) —— 单个震撼数字撑半屏
> 一个核心数字要砸出冲击力时用。左侧超大数字,右侧三两条支撑论据卡。
```python
from pptx.enum.text import MSO_ANCHOR
s = P.add_slide(prs)
P.apply_brand(s, "inner")
P.page_title(s, "一年走完三年的路", eyebrow="成效")
# 左:超大数字(主色)
P.add_textbox(s, P.SAFE_LEFT, 2.4, 5.2, 2.4, "3.6×", 140, bold=True,
color=P.PRIMARY, font=P.EN_FONT, anchor=MSO_ANCHOR.MIDDLE,
name="big_stat")
P.add_textbox(s, P.SAFE_LEFT, 4.9, 5.2, 0.5, "研发效率提升", 20, color=P.INK,
name="big_stat_label")
# 右:支撑论据(浅底小卡堆叠)
facts = ["实验自动排程,人力释放 60%", "失败配方提前预警,返工 ↓45%", "知识沉淀复用,新人上手周期减半"]
for i, f in enumerate(facts):
yy = 2.5 + i * 1.25
P.add_card(s, 6.6, yy, 6.0, 1.05, fill=P.PRIMARY_WASH, shadow=False)
P.add_dot(s, 6.95, yy + 0.45, color=P.PRIMARY)
P.add_textbox(s, 7.35, yy, 5.0, 1.05, f, 16, color=P.INK,
anchor=MSO_ANCHOR.MIDDLE, name=f"fact_{i}")
```
---
## 选版式速查
```
封面 → L1 (Cover)
目录 → L2 (Agenda)
转场 / 换章 → L3 (Section Divider)
要点 ≤ 5 条(纯文字) → L4 (Bullets)
对比类 (前/后, A/B) → L5 (Two-Column)
有数据图表 → L6 (Chart-focus)
有大图 / 视觉优先 → L7 (Image-focus)
观点强调 / 名言 → L8 (Quote)
末页 → L9 (Q&A) [强制]
2-4 个关键数字 → L10 (KPI 数字卡) ← 优先于硬画柱图
业务概念(能力/模块) → L11 (卡片网格 + 图标) ← 优先于圆点 bullet
流程 / 步骤 → L12 (Process)
单个震撼数字 → L13 (Stat Highlight)
```
## 三个常犯的越界场景
1. **bullet 字数超额** —— 22pt 在 11.5 寸宽下每行约 50 个中文字,超 1 行就溢出 0.6 in 框。根本解法是**字数压缩**(见 design_principles.md §字数预算),不要靠 `auto_size` 收字号兜底。
2. **卡片内容超出卡片** —— 卡片内文字按 `卡宽 - 2×0.4` 内边距算框宽;标题/正文字数超了会顶出卡片下边缘。卡片高度留够(KPI 卡 ≥2.5,概念卡 ≥3.4)。
3. **图片不等比拉伸** —— `add_picture(width=, height=)` 同时给会变形;**只给 width 或 height 一项**。
4. **渐变深底上用深色字** —— L3 章节页 / cover 渐变块上的文字必须 `WHITE` / `ACCENT_SOFT`,用 `INK` 看不清。
```

View File

@ -0,0 +1,73 @@
# Modes — Index
A **mode** is the deck's **narrative + persuasion skeleton** — how the argument is organized and advanced across pages. Lock **one mode per deck**; it shapes page sequencing, title voice, page-structure tendencies, and speaker-notes register.
> A mode is *not* a visual style. **Mode = how you argue; visual style = how it looks** (see [`visual-styles/_index.md`](../visual-styles/_index.md)). The two are locked independently — any mode pairs with any visual style (a `pyramid` deck can look `swiss-minimal` or `dark-tech`).
---
## 1. Catalog (5 modes)
Each mode has its own file with: narrative skeleton, page-structure tendencies, speaker-notes register, and a page skeleton example. **Read only the file for the mode you lock** — never glob the directory.
| Mode | Narrative skeleton | Best for |
|---|---|---|
| [`pyramid`](./pyramid.md) | Conclusion first; MECE arguments; every datum carries a comparison | Decision support, analysis, strategy, board / exec reports |
| [`narrative`](./narrative.md) | Story arc — situation → tension → resolution; suspense and turns | Pitches, case studies, brand journeys, fundraising |
| [`instructional`](./instructional.md) | Concept decomposition; step-by-step; parallel exposition | Training, tutorials, explainers, knowledge sharing |
| [`showcase`](./showcase.md) | Visual-led impact; big imagery / numbers; emotional rhythm | Launches, brand reveals, event / promo decks |
| [`briefing`](./briefing.md) | Neutral, complete, scannable; topic titles, even weight, no thesis | Status updates, reference decks, catalogs, meeting packs, FAQs |
> The five partition presentation *intent*, not aesthetics: persuade (`pyramid`) · tell a story (`narrative`) · teach (`instructional`) · impress (`showcase`) · simply inform (`briefing`).
>
> **A mode is a lens, not a mandate over the user's own structure.** When the user brings their own outline, it is authoritative: transcribe it into `design_spec.md §IX` as given — page order and titles preserved — and let the mode govern only voice / register and page-internal treatment. A mode never reorders a user's pages or rewrites their given titles (mode is Reference-strength; a user-authored outline is exactly the override). When the user gives no structure, the mode does the structural lifting. To lay an outline out with the least reshaping, `briefing` imposes the lightest skeleton.
---
## 2. Auto-selection — content / audience signal → mode
| Signal | Recommended mode | Alternates |
|---|---|---|
| Strategic decision / analysis / board / investor | `pyramid` | `narrative` |
| Pitch / case study / origin story / campaign arc | `narrative` | `showcase` |
| Course / onboarding / how-to / science explainer | `instructional` | `pyramid` |
| Product launch / brand reveal / event opener / keynote / 发布会 / TED | `showcase` | `narrative` |
| Status update / reference / catalog / FAQ / meeting pack / 周报 / 参考 | `briefing` | `pyramid` |
> No single signal dominates — read the deck's actual purpose from `c. Key Information`. When two modes fit, follow the **primary** intent of the body pages, not the cover. A data review legitimately runs almost entirely `pyramid`; do not force variety.
**Close calls** — the genuinely adjacent pairs; every other pair is far enough apart that the auto-selection signal decides.
| Torn between | …the first when | …the second when |
|---|---|---|
| `pyramid` / `briefing` | it must land a recommendation — conclusion-first, every number compared | it must inform completely without arguing — topic titles, even weight |
| `narrative` / `pyramid` | the point lands through a story arc, tension → resolution | the point lands as a conclusion stated up front, then supported |
| `narrative` / `showcase` | an argument travels through the story | presence leads — minimal copy, one big visual per page |
| `instructional` / `briefing` | the goal is to build understanding step by step | the goal is to lay out a complete reference to scan |
> "Keynote-style" is a *mode* request, not a visual style — it means showcase pacing (one big idea per page, full-bleed hero, reveal rhythm), skinned by whatever visual style fits the brand (`swiss-minimal` clean, `dark-tech` dramatic, `glassmorphism` premium). Don't reach for a "keynote" visual style — there isn't one, by design.
---
## 3. How to use
1. Strategist reads this index at confirmation `d. Layer 1`.
2. Pick one mode from the auto-selection table + the deck's stated purpose.
3. Lock it: write `- mode: <name>` into `spec_lock.md`, record the rationale in `design_spec.md`.
4. Executor reads **only** `modes/<locked-mode>.md` at generation entry — never globs this directory.
**Lock scope**: deck-wide (one mode per deck). The five are the catalog you select from; if the structure is genuinely mixed, pick the mode of the body pages and let pages vary within it, or recommend a `custom` blend (§4). Recommend the best fit; the user confirms.
---
## 4. Escape hatch — `custom`
`custom` holds **any bespoke narrative direction the five don't give as-is** — and what *kind* of thing it is doesn't matter. It might be a nameable cadence (dialectic 正反合, myth-vs-reality, countdown / Top-N, Socratic), a deliberate multi-act fusion of several modes, or the user's own feel for how the deck should carry (confrontational here, detached there). Don't try to taxonomize it.
**Either side may originate it.** The user can ask for it directly; or the Strategist — as the deck's strategist — may **recommend** `custom` when a bespoke direction (often a fusion of two modes) genuinely serves the deck better than any single preset. Like every confirmation, it's a recommendation the user confirms or overrides — and the recommendation must **spell the custom out in plain language** (what the cadence / fusion / posture actually is), never present the bare token `custom`, so the user confirms something legible. Either way, the Strategist **crystallizes the intent into a `- mode_behavior:` paragraph** — concrete enough that the Executor can follow it per page (the act sequence or posture shifts, the title voice, the page rhythm, the notes register). Set `- mode: custom` in `spec_lock.md` with that sibling line; the Executor follows the prose in place of a preset file. (This records the intent so it survives 20 pages of generation — the Executor only ever reads `spec_lock.md`, never the chat.)
> **One value per deck — fusion is *one* `custom`, not several modes.** A deck always locks a single `mode`. A multi-mode blend is expressed as **one** `mode: custom` whose `mode_behavior` paragraph describes the acts — never by locking several modes.
>
> **First ask whether it's really fusion.** A locked mode is a *tendency*, not a cage: a `narrative` deck can still carry one analytical (pyramid-style) page, an `instructional` deck one showcase reveal — that is leaning within a dominant mode, and needs **no** `custom`. Reach for `custom` only when there is genuinely no single dominant spine.
**The one thing to avoid**: reaching for `custom` as a *dodge* — defaulting to it because picking among the five takes judgment. When a preset genuinely fits, lock the preset; propose `custom` when a bespoke direction earns its place, not to avoid choosing. (And a user-stated direction is authoritative the same way a user-supplied outline is — see the lens-not-mandate note in §1.)

View File

@ -0,0 +1,41 @@
# Mode: briefing
Neutral information delivery. Lay the facts out plainly and completely, organized for scanning and lookup — no thesis to argue, no story to tell, no lesson to build, no spectacle. For status updates, reference decks, catalogs, meeting packs, FAQs, data references.
---
## 1. Narrative skeleton
**No thesis, by design**: the deck informs rather than argues. Don't manufacture a conclusion-first claim (that's `pyramid`) or a turn (that's `narrative`) where the material is simply "here is what's true".
**Topic titles, not assertions**: the page title names its subject plainly ("Q3 headcount by team", "Supported file formats") — clarity for lookup beats a persuasive finding. This is the deliberate inverse of `pyramid`'s assertion titles.
**`core_message` states coverage, not a claim**: when filling `design_spec.md §IX`, write each page's `core_message` as what the page lays out ("Q3 headcount across teams"), not what it proves ("headcount is concentrating in engineering"). The §IX field reads as an assertion under the other modes; under `briefing` it names scope.
**Complete over selective**: include the full reference set the audience needs to scan, not only the points that support a case. Coverage is the value here.
**Parallel, even treatment**: sibling items get the same shape and weight so they can be compared and located quickly; nothing is dramatized over its peers unless it genuinely differs.
**Sectioned for navigation**: group related facts, label the groups, keep order predictable (chronological / categorical / alphabetical) so the reader can jump to what they need.
---
## 2. Page-structure tendencies
- Tables, definition lists, status cards, reference grids, dashboards — scannable structures over hero compositions.
- Even hierarchy within a section; consistent layout across sibling pages so the eye always knows where to look.
- Where one figure genuinely matters (a total, a status flag, an exception), surface it — but don't invent a punchline the content doesn't have.
> Table / list / dashboard / status-card geometry lives in [`templates/charts/`](../../templates/charts/); this mode decides *that the page informs completely and neutrally*, not pixel positions.
## 3. Speaker-notes register
Even, factual, plain. State what the page shows without building tension or pressing a "so what". No rhetorical questions, no suspense — a clear read-out the listener can follow or skim. Numbers stated plainly. (Common framework: [`executor-base.md §8`](../executor-base.md).)
## 4. Page skeleton example
```
Title: "Q3 deliverables by workstream" ← a topic label, not a claim
Body: status table — workstream | owner | status | due — rows at equal weight
Notes: "Three workstreams are on track; payments is at risk on the integration." (plain read-out)
```

View File

@ -0,0 +1,45 @@
# Mode: instructional
Teaching-led exposition. Decompose a concept into ordered, digestible parts and build understanding step by step. For training, tutorials, explainers, onboarding, science / knowledge sharing.
---
## 1. Narrative skeleton
**Decompose, then sequence**: break the subject into parts and present them in a deliberate order (simple → complex, prerequisite → dependent, overview → detail).
**One concept per page**: each page teaches a single idea well; do not stack unrelated concepts.
**Parallel exposition**: sibling concepts get parallel structure — same shape, same depth — so the audience can compare and map them.
**Show, then tell**: lead with a concrete example or analogy, then state the principle. A worked example beats an abstract definition.
**Signpost**: orient the learner — what we covered, what comes next.
Titles state what the page teaches ("How attention weights are computed") — clear over clever.
---
## 2. Page-structure tendencies
- Numbered steps / ordered flows for processes; parallel cards for sibling concepts.
- Diagrams that build incrementally; annotate the part currently being explained.
- A concrete example anchors each abstract point.
> Step / flow / diagram geometry lives in [`templates/charts/`](../../templates/charts/); this mode decides *the learning order and granularity*.
---
## 3. Speaker-notes register
Patient, explanatory. Define before using; analogy then principle. Anticipate the learner's question and answer it. Steady pace; signpost transitions ("now that we have X, we can ask Y"). Conversational data. (Common framework: [`executor-base.md §8`](../executor-base.md).)
---
## 4. Page skeleton example
```
Title: "Step 2 — Scoring each token against the query"
Body: concrete example (3 tokens) → the rule it illustrates → one diagram
Notes: "Remember the query from the last page? Here's what it does next…"
```

View File

@ -0,0 +1,43 @@
# Mode: narrative
Story-arc persuasion. Carry the audience through situation → tension → resolution, using suspense, turns, and human framing so the point lands emotionally before it lands logically. For pitches, case studies, brand journeys, fundraising.
---
## 1. Narrative skeleton
**Arc, per deck and per page**: scenario → conflict → resolution. Set a stake, raise a tension, resolve it — then bridge to the next beat.
**Suspense and payoff**: pose a question at the right moment, answer it on the next page. Let curiosity pull the audience forward.
**Human framing**: anchor abstract points in a protagonist, a moment, a concrete stake ("a team that shipped in two weeks instead of three months").
**At least one turn**: a reframe, a reveal, a "but here's what changed". Flat exposition is not narrative.
Titles read as beats that advance the arc ("Then the numbers stopped adding up"), not as labels.
---
## 2. Page-structure tendencies
- Pages alternate rhythm: a dense beat followed by a breathing page (single image / quote / turn) to prevent fatigue.
- Visual weight guides the eye through each beat (hero image, one focal number, a pull quote).
- Continuity within a chapter, variation between chapters.
> Structure serves the arc, not a grid. Layout / chart geometry lives in [`templates/charts/`](../../templates/charts/) and [`executor-base.md`](../executor-base.md); this mode decides *the emotional beat of each page*.
---
## 3. Speaker-notes register
Conversational narration — like talking with the audience, not reading a report. Scenario-conflict-resolution per page. Metaphors make the abstract tangible ("like adding a turbocharger"). Plain rhetorical questions create suspense; bridge each page from the prior one. Conversational data ("nearly a third", "more than doubled"). (Common framework: [`executor-base.md §8`](../executor-base.md).)
---
## 4. Page skeleton example
```
Page 3 (turn): full-bleed image + one line — "Then deployment broke."
Page 4 (payoff): the reframe — what changed, one focal number
Notes: "You might be wondering where the opportunity is…" (bridges, builds)
```

View File

@ -0,0 +1,58 @@
# Mode: pyramid
Conclusion-first argumentation. State the answer, then support it with mutually-exclusive, collectively-exhaustive evidence — every claim earns its place, every number carries a comparison. For audiences who want the result before the process: executives, boards, investors, decision-makers.
---
## 1. Narrative skeleton
**Conclusion first**: the page title *is* the conclusion, not a label. The body develops the supporting arguments beneath it.
SCQA opening, pyramid body:
| Stage | Role | Where |
|---|---|---|
| Situation | establish shared context | cover / first 1-2 pages |
| Complication | the tension / problem | early pages |
| Question | the implicit question to resolve | transition |
| Answer | the recommendation, developed MECE | all body pages |
**Assertion titles** — write the finding, not the topic:
| Weak (topic) | Strong (assertion) |
|---|---|
| "Market Overview" | "Domestic market grows 23% YoY, outpacing the global average" |
| "Challenges" | "Three structural contradictions block scaled deployment" |
| "Our Solution" | "Three-phase path: Focus, Expand, Scale" |
**Data never stands alone** — every figure pairs with a comparison (prior period / benchmark / competitor / target / rank) and a "so what". A bare number is an incomplete thought in this mode.
**MECE** — when decomposing (drivers, segments, options), branches are mutually exclusive and collectively exhaustive; parts sum to the whole (or label "Other").
---
## 2. Page-structure tendencies
- Title (the conclusion) → one-line takeaway → supporting evidence beneath.
- Each body page answers one question and states its own one-sentence conclusion.
- Decomposition pages (driver tree / MECE breakdown / 2×2 matrix) carry the analytical load.
- Source attribution on every data page.
> Page structure is a tendency, not a coordinate template. Card / tree / chart / KPI geometry lives in [`templates/charts/`](../../templates/charts/) — adapt those skeletons, do not reinvent. This mode decides *what argument each page makes*, not pixel positions.
---
## 3. Speaker-notes register
Conclusion-driven: the first sentence of each page's notes is the takeaway, then 2-3 supporting facts in flowing prose. Composed, authoritative. Every number paired with its comparison in the same sentence ("23% — nearly double the industry's 12%"). Spell percentages as words where the spoken form reads more naturally. (Common framework: [`executor-base.md §8`](../executor-base.md).)
---
## 4. Page skeleton example
```
Title: "Retention, not acquisition, now drives growth" ← the conclusion
Takeaway: one line — "CAC up 40% YoY, yet repurchase lifted 60% of revenue growth"
Body: 3 MECE arguments, each with one contextualized datum
Footer: Source: … | page #
```

View File

@ -0,0 +1,43 @@
# Mode: showcase
Visual-led impact. Let imagery, scale, and rhythm carry the message; minimize copy, maximize presence. For product launches, brand reveals, event openers, promotional decks.
---
## 1. Narrative skeleton
**Image / number leads, words support**: each page has one dominant visual element — a hero image, a single huge number, a short phrase — not a paragraph.
**Emotional rhythm**: build and release — a run of bold pages punctuated by a quiet one. Pace for feeling, not density.
**One idea per page, stated big**: reduce each page to a single takeaway expressed at scale.
**Reveal structure**: hold back, then reveal (the product, the result, the tagline) for maximum effect.
Titles are short and evocative — a phrase, not a sentence.
---
## 2. Page-structure tendencies
- Full-bleed imagery with overlay text; a single focal hero number / phrase.
- Generous negative space; the page breathes around one element.
- Bold use of the deck's theme color for atmosphere (cover / chapter pages).
> Hero / full-bleed / breathing-page geometry lives in [`executor-base.md`](../executor-base.md) and [`image-layout-patterns.md`](../image-layout-patterns.md); this mode decides *what single thing each page presents*.
---
## 3. Speaker-notes register
Energetic, evocative — sets mood and builds anticipation. Short, punchy sentences. Lets the visual do the work and narrates the feeling around it. (Common framework: [`executor-base.md §8`](../executor-base.md).)
---
## 4. Page skeleton example
```
Page (reveal): full-bleed product image + one line — "Meet the new standard."
Page (proof): single huge number "10×" + one phrase, vast whitespace
Notes: "Imagine cutting that to seconds. That's what this does."
```

View File

@ -0,0 +1,777 @@
# Shared Technical Standards
Common technical constraints for PPT Master, eliminating cross-role file duplication.
---
## 1. SVG Banned Features Blacklist
The following are **forbidden** in generated SVGs — PPT export breaks otherwise:
### 1.0 Text characters: must be well-formed XML
SVG is strict XML. Two rules for all text and attribute values:
| Character category | Required form | Forbidden form |
|---|---|---|
| Typography & symbols (em dash, en dash, ©, ®, →, ·, NBSP, full-width punctuation, emoji…) | **Raw Unicode characters** — write `—` `` `©` `®` `→` directly | HTML named entities — `&mdash;` `&ndash;` `&copy;` `&reg;` `&rarr;` `&middot;` `&nbsp;` `&hellip;` `&bull;` etc. |
| XML reserved characters (`&`, `<`, `>`, `"`, `'`) | **XML entities only**`&amp;` `&lt;` `&gt;` `&quot;` `&apos;` (e.g. `R&amp;D`, `error &lt; 5%`) | Bare `&` `<` `>` (e.g. `R&D`, `error < 5%`) |
One offending character invalidates the file and aborts export. Numeric refs (`&#160;` / `&#xa0;`) are XML-legal but discouraged.
**Structural blacklist** (in addition to the character rules above):
| Banned Feature | Description |
|----------------|-------------|
| `mask` | Masks |
| `<style>` | Embedded stylesheets |
| `class` | CSS selector attributes (`id` inside `<defs>` is a legitimate reference and is NOT banned) |
| External CSS | External stylesheet links |
| `<foreignObject>` | Embedded external content |
| `<symbol>` + `<use>` | Symbol reference reuse |
| `textPath` | Text along a path |
| `@font-face` | Custom font declarations |
| `<animate*>` / `<set>` | SVG animations |
| `<script>` / event attributes | Scripts and interactivity |
| `<iframe>` | Embedded frames |
> **`marker-start` / `marker-end` is conditionally allowed** — see §1.1 for constraints. The converter maps qualifying markers to native DrawingML `<a:headEnd>` / `<a:tailEnd>`.
>
> **`clipPath` on `<image>` is conditionally allowed** — see §1.2 for constraints. The converter maps qualifying clip shapes to native DrawingML picture geometry (`<a:prstGeom>` or `<a:custGeom>`).
>
> **`<pattern>` fills are conditionally allowed** — see §7 *Pattern Fill* for the required `data-pptx-pattern` annotation and the closed OOXML preset enum. Hand-drawn pattern geometry is NOT honored; the converter emits the named PPTX preset only. Missing or invalid preset values produce diagonal stripes (warning) or schema-failed PPTX (error).
>
> **Replacing `<mask>` effects** — DrawingML has no per-pixel alpha. Route by effect:
> - Image gradient overlay (vignette/fade/tint) → stacked `<rect>` with `<linearGradient>`/`<radialGradient>` (§6 Image Overlay)
> - Non-rectangular image crop (circle/rounded/hexagon) → `clipPath` on `<image>` (§1.2)
> - Inner glow / soft-edge → `<filter>` with `<feGaussianBlur>` (§6 Glow)
> - Drop shadow → filter shadow or layered rect (§6 Shadow)
>
> Pixel-level alpha effects (text-knockout image fills, arbitrary alpha composites) have no PPT path — bake into the source image at Image_Generator stage.
---
### 1.1 Line-end Markers (Conditionally Allowed)
`marker-start` and `marker-end` on `<line>` and `<path>` elements are allowed **only** when the referenced `<marker>` satisfies all of the following:
| Requirement | Reason |
|-------------|--------|
| Marker `<marker>` element defined inside `<defs>` | Converter looks up marker defs via id index |
| `orient="auto"` | DrawingML arrow auto-rotates along the line tangent; other orient values will not round-trip |
| Marker shape is **one of**: closed 3-vertex path/polygon (triangle), closed 4-vertex path/polygon (diamond), `<circle>` / `<ellipse>` (oval) | These three map cleanly to DrawingML `type="triangle" / "diamond" / "oval"`. Any other shape is silently dropped with a warning. |
| Marker child's `fill` **matches** the parent line's `stroke` color | In DrawingML the arrow head inherits the line color — a mismatched marker fill will look wrong on export. |
| `markerWidth` / `markerHeight` roughly in `315` range | Mapped to `sm` (<6) / `med` (612) / `lg` (>12) size buckets. |
**Use boundary**:
- `marker-start` / `marker-end`: only for connector arrows where the line is primary
- For block / chunky / solid arrows (arrow body is the visual object), use standalone closed `<path>` / `<polygon>`; see `templates/charts/chevron_process.svg` or `templates/charts/process_flow.svg`
**Supported DrawingML mapping**:
| SVG Marker Shape | DrawingML Output |
|------------------|------------------|
| `<path d="M0,0 L10,5 L0,10 Z"/>` (triangle) | `<a:tailEnd type="triangle" w="med" len="med"/>` |
| `<polygon points="0,0 10,5 0,10"/>` | `<a:tailEnd type="triangle" w="med" len="med"/>` |
| 4-vertex closed path/polygon | `<a:tailEnd type="diamond" .../>` |
| `<circle cx="5" cy="5" r="4"/>` | `<a:tailEnd type="oval" .../>` |
**Recommended template** — a standard arrow-head definition ready to reuse:
```xml
<defs>
<marker id="arrowHead" markerWidth="10" markerHeight="10" refX="9" refY="5"
orient="auto" markerUnits="strokeWidth">
<path d="M0,0 L10,5 L0,10 Z" fill="#1976D2"/>
</marker>
</defs>
<line x1="100" y1="200" x2="400" y2="200" stroke="#1976D2" stroke-width="3"
marker-end="url(#arrowHead)"/>
```
> ⚠️ Unclassifiable marker shapes (curved paths, multi-segment, >4 vertices) are silently dropped — line renders without arrow. Use a manual `<polygon>` for exotic shapes.
---
### 1.2 Image Clipping (Conditionally Allowed)
`clip-path` on `<image>` elements is allowed when the referenced `<clipPath>` satisfies the following:
| Requirement | Reason |
|-------------|--------|
| `<clipPath>` element defined inside `<defs>` | Converter looks up clip defs via id index |
| Contains a **single** shape child | First child is used; multiple children are not composited |
| Shape is one of: `<circle>`, `<ellipse>`, `<rect>` (with rx/ry), `<path>`, `<polygon>` | These map to DrawingML geometry (preset or custom) |
| Used **only on `<image>` elements** | Non-image elements with clip-path are **forbidden** |
**Use boundary**:
- Only on `<image>` for non-rectangular crops (circular avatars, rounded frames, hexagons)
- NOT on shapes (`<rect>`/`<circle>`/`<path>`/`<g>`/`<text>`) — draw the target shape directly. A rect clipped to a circle is just a circle.
- PowerPoint's SVG renderer doesn't handle `clipPath`; only the Native PPTX converter does.
**Supported DrawingML mapping**:
| SVG Clip Shape | DrawingML Output | Use Case |
|----------------|------------------|----------|
| `<circle>` / `<ellipse>` | `<a:prstGeom prst="ellipse"/>` | Circular avatar, oval frame |
| `<rect rx="..."/>` | `<a:prstGeom prst="roundRect"/>` with adj value | Rounded rectangle photo frame |
| `<path>` / `<polygon>` | `<a:custGeom>` with path commands | Hexagon, diamond, custom shape |
**Recommended template** — circular image clip:
```xml
<defs>
<clipPath id="avatarClip">
<circle cx="200" cy="200" r="100"/>
</clipPath>
</defs>
<image href="../images/photo.jpg" x="100" y="100" width="200" height="200"
clip-path="url(#avatarClip)" preserveAspectRatio="xMidYMid slice"/>
```
**Rounded rectangle clip** — for card-style image frames:
```xml
<defs>
<clipPath id="cardClip">
<rect x="60" y="120" width="400" height="250" rx="16"/>
</clipPath>
</defs>
<image href="../images/banner.jpg" x="60" y="120" width="400" height="250"
clip-path="url(#cardClip)" preserveAspectRatio="xMidYMid slice"/>
```
> ⚠️ `clip-path` on non-image elements is FORBIDDEN — quality checker errors out. Draw target geometry directly.
---
## 2. PPT Compatibility Alternatives
| Banned Syntax | Correct Alternative |
|---------------|---------------------|
| `fill="rgba(255,255,255,0.1)"` | `fill="#FFFFFF" fill-opacity="0.1"` |
| `<g opacity="0.2">...</g>` | Set `fill-opacity` / `stroke-opacity` on each child element individually |
| `<image opacity="0.3"/>` | Overlay a `<rect fill="background-color" opacity="0.7"/>` mask layer after the image |
**Mnemonic**: PPT does not recognize rgba, group opacity, or image opacity.
> Arrows: prefer `marker-end` for connector lines (§1.1) — converter produces native auto-rotating arrow heads. For block/chunky arrows, use standalone closed shapes; see `templates/charts/chevron_process.svg` and `templates/charts/process_flow.svg`.
---
## 3. Canvas Format Quick Reference
> See [`canvas-formats.md`](canvas-formats.md) for the full format table (presentations / social / marketing) and the format-selection decision tree.
---
## 4. Basic SVG Rules
- **viewBox** must match the canvas dimensions (`width`/`height` must match `viewBox`)
- **Background**: Use `<rect>` to define the page background color
- **`<tspan>`** has two purposes: (1) manual line breaks (use `dy` or explicit `y`); (2) inline run formatting on the same line (color/weight/size). `<foreignObject>` is FORBIDDEN. See "Single logical line" rule below.
- **Fonts**: every `font-family` stack MUST end with a pre-installed family (Microsoft YaHei / SimSun / Arial / Times New Roman / Consolas …); `@font-face` is FORBIDDEN. Full rule: [`strategist.md §g`](strategist.md).
- **Styles**: inline only (`fill=""`, `font-size=""`); `<style>`/`class` FORBIDDEN (`id` inside `<defs>` is fine)
- **Colors**: HEX only; transparency via `fill-opacity`/`stroke-opacity`
- **Images**: `<image href="../images/xxx.png" preserveAspectRatio="xMidYMid slice"/>`
- **Icons**: `<use data-icon="<library>/<name>" x="" y="" width="48" height="48" fill="#HEX"/>` (auto-embedded post-processing). Always include library prefix. One stylistic library per deck (`chunk-filled`/`tabler-filled`/`tabler-outline`/`phosphor-duotone`); `simple-icons` only for real brand marks. See [`../templates/icons/README.md`](../templates/icons/README.md).
### Inline Text Runs (Single Logical Line = Single `<text>`)
One logical line — even with mixed colors/weights/sizes — MUST be one `<text>` with inline `<tspan>` children. Never use multiple adjacent `<text>` elements. The converter maps each `<tspan>` to a `<a:r>` run within the same PPT text frame, keeping the line as one editable shape.
**DO** — one `<text>` → one text frame with three runs:
```xml
<text x="100" y="200" font-size="24" fill="#333333">
实现<tspan fill="#1A73E8" font-weight="bold">10倍</tspan>效率提升
</text>
```
**DON'T** — three side-by-side `<text>` elements become three separate text frames in PPT (breaks edit-as-one-line, risks alignment drift, makes spacing fragile):
```xml
<text x="100" y="200" font-size="24" fill="#333333">实现</text>
<text x="160" y="200" font-size="24" fill="#1A73E8" font-weight="bold">10倍</text>
<text x="240" y="200" font-size="24" fill="#333333">效率提升</text>
```
**⚠️ Inline tspans must NOT carry `x`/`y`/`dy`** — those mark a new line, and `flatten_tspan` will split into a separate text frame. `dx` is safe (kerning, stays inline). Only set `x`/`y`/`dy` on tspans that genuinely start a new line.
**Multi-line `<text>` with per-line emphasis works**: an outer line-break tspan (with `x` + `dy` or `y`) MAY contain nested inline tspans for color/weight/size — converter walks nested tspans and emits one run per styled segment:
```xml
<text x="80" y="190" font-size="18" fill="#333333">
<tspan x="80" dy="0">完成率<tspan fill="#4CAF50" font-weight="bold">98%</tspan>超预期</tspan>
<tspan x="80" dy="35">成本降低<tspan fill="#F44336" font-weight="bold">¥120万</tspan></tspan>
</text>
```
**DON'T** — same-line column jump via `<tspan x="...">`:
```xml
<text x="100" y="200" font-size="18" fill="#333333">
<tspan x="100">左列</tspan><tspan x="600" font-weight="bold">右列</tspan>
</text>
```
`x` on a tspan starts a new line, splitting into two independent text frames. For two-column layouts, write two `<text>` elements.
**Default — lift key information.** Uniform-styled paragraphs read as walls of text. Wrap these in `<tspan fill="..." font-weight="bold">`:
- **Numerical results** — percentages, multipliers (`10x`), absolute amounts (`¥120万`)
- **Contrasts** — gain/loss, before/after, target/actual
- **One or two load-bearing nouns per sentence** — the term that carries the insight
Do NOT highlight: connectives, common verbs, every noun, decorative adjectives, structural text (footer/axis/legend/page number/labels).
Color: use the deck's primary brand color for emphasis. Reserve green/red for actual positive/negative semantics.
**DON'T** — uniform-styled paragraph buries the insight:
```xml
<text x="80" y="200" font-size="20" fill="#333333">
2024年公司营收同比增长35%达到12亿元创历史新高
</text>
```
**DO** — same line, key data lifted:
```xml
<text x="80" y="200" font-size="20" fill="#333333">
2024年公司营收同比<tspan fill="#1A73E8" font-weight="bold">增长35%</tspan>达到<tspan fill="#1A73E8" font-weight="bold">12亿元</tspan>创历史新高
</text>
```
### Element Grouping (Mandatory)
Wrap logically related elements in top-level `<g id="...">` groups. Produces PowerPoint groups in PPTX, making slides easier to select/move/edit and providing stable anchors for optional per-element entrance animation.
> ⚠️ Only `<g opacity="...">` is banned (§2). Plain `<g>` for grouping is required.
**Animation-ready rule**: direct children of `<svg>` should be semantic groups, not raw drawing atoms. Aim for **38 top-level content `<g id>` groups per slide** (the 38 budget excludes page chrome — see below); each content group becomes one entrance step under the chosen `--animation-trigger` mode (one click in `on-click`, one cascade slot in `after-previous`, parallel in `with-previous`).
**Chrome groups are excluded automatically.** The exporter treats top-level groups whose id contains chrome tokens as page chrome and skips them in the animation sequence — they appear together with the slide. Tokens (matched against id after splitting on `-` / `_`): `background`, `bg`, `decoration` / `decorations` / `decor`, `header`, `footer`, `chrome`, `watermark`, `pagenumber` / `pagenum` / `page-number`, `nav`, `logo`, `rule`. So `<g id="bg-texture">`, `<g id="cover-footer">`, `<g id="p03-header">`, `<g id="bottom-decor">`, `<g id="nav">`, `<g id="logo-area">`, `<g id="column-rule">` all skip animation while keeping their `<g>` wrapper for editing/grouping. Use these naming conventions for chrome — do **not** strip the `<g>` wrapper.
**What to group**:
| Grouping Unit | Contains |
|---------------|----------|
| Card / panel | Background rect + (optional shadow only if the card floats over a photo/colored panel — see §6) + icon + title + body text |
| Process step | Number circle + icon + label + description |
| List item | Bullet / number + icon + title + description |
| Icon-text combo | Icon element + adjacent label |
| Page header | Title + subtitle + accent decoration |
| Page footer | Page number + branding |
| Decorative cluster | Related decorative shapes (rings, orbs, dots) |
**Do not**:
- Put the whole slide into one giant `<g>`; that leaves only one animation step.
- Leave many top-level `<rect>` / `<text>` / `<path>` elements ungrouped; fallback animation is capped at 8 primitives and dense flat pages may skip animation.
- Split every icon, text line, or decorative mark into separate top-level groups; that creates too many click steps.
- Use anonymous top-level groups. Every top-level semantic group needs a descriptive `id`.
**Example**:
```xml
<g id="card-benefits-1">
<!-- This card floats over a colored panel — shadow is appropriate. On a flat white canvas, omit the filter. -->
<rect x="60" y="115" width="565" height="260" rx="20" fill="#FFFFFF" filter="url(#shadow)"/>
<use data-icon="chunk-filled/bolt" x="108" y="163" width="44" height="44" fill="#0071E3"/>
<text x="105" y="270" font-size="56" font-weight="bold" fill="#0071E3">10×</text>
<text x="250" y="270" font-size="30" font-weight="bold" fill="#1D1D1F">Faster</text>
<text x="105" y="310" font-size="18" fill="#6E6E73">Reduce production time from days to hours.</text>
</g>
```
**Naming**: descriptive `id` on top-level `<g>` is **required** (e.g., `card-1`, `step-discover`, `header`, `footer`). Each top-level `<g id>` becomes one anchor for per-element entrance animation in PPTX export; without it, the exporter falls back to at most 8 top-level primitives or skips animation on dense pages.
---
## 5. Post-processing Pipeline (3 Steps)
Must be executed in order — skipping or adding extra flags is FORBIDDEN:
```bash
# 1. Split speaker notes into per-page note files
python3 scripts/total_md_split.py <project_path>
# 2. SVG post-processing (icon embedding, image crop/embed, text flattening, rounded rect to path)
python3 scripts/finalize_svg.py <project_path>
# 3. Export PPTX (embeds speaker notes by default)
python3 scripts/svg_to_pptx.py <project_path>
# Output (default-flow mode):
# exports/<project_name>_<timestamp>.pptx ← native pptx (canonical output)
# backup/<timestamp>/svg_output/ ← Executor SVG source backup (always written)
#
# Add --svg-snapshot to additionally emit:
# exports/<project_name>_<timestamp>_svg.pptx ← SVG snapshot pptx (sibling of native pptx)
```
**Optional animation flags** (only when the user asks):
- `-t <effect>` — page transition (`fade` / `push` / `wipe` / `split` / `strips` / `cover` / `random` / `none`; default `fade`)
- `-a <effect>` — per-element entrance animation (`fade` / `auto` / `mixed` / `random` / one of 22 named effects / `none`; **default `none`** — pages appear as a whole, no auto element builds; opt in with `auto`, which maps effect from group id — image-like ids cycle zoom/dissolve/circle/box/diamond/wheel, other matches map to a single effect, unmatched ids cycle fade/wipe/fly/zoom). Anchors on top-level `<g id="...">` groups.
- `--animation-trigger {on-click,with-previous,after-previous}` — Start mode matching PowerPoint's animation-pane Start dropdown. Default `after-previous` (cascade on slide entry; pace via `--animation-stagger <seconds>`); `on-click` advances per click; `with-previous` plays all groups together.
- `--animation-config <path>` — optional object-level animation sidecar. Default: `<project>/animations.json` when present.
- `--auto-advance <seconds>` — kiosk-style auto-play
**Optional recorded narration** (only when the user asks for narrated/video export):
```bash
python3 scripts/notes_to_audio.py <project_path> --voice zh-CN-XiaoxiaoNeural
python3 scripts/svg_to_pptx.py <project_path> --recorded-narration audio
```
- `notes_to_audio.py` reads split `notes/*.md` files and writes one audio file per slide to `audio/`. Default `edge` output is MP3; configured cloud providers may output MP3 or WAV depending on provider settings.
- `--recorded-narration audio` prepares PowerPoint's recorded timings and narrations: every slide needs matching `m4a` / `mp3` / `wav` audio, every duration must be readable by `ffprobe`, and `on-click` object animation is rejected.
- `--recorded-narration audio` embeds matching audio, keeps speaker notes, and sets slide timings from audio duration.
- `--narration-audio-dir audio` is the lower-level embedding path for partial audio coverage; it does not prepare a complete recorded-timings export.
- Long-audio import and automatic long-audio splitting are not supported.
Full reference: [`animations.md`](animations.md).
**Prohibited**:
- NEVER use `cp` as a substitute for `finalize_svg.py`
- NEVER force `-s output` for the legacy/preview pptx (PowerPoint's internal SVG parser drops icons and rounded corners). Default auto-split already gives native the high-fidelity source it needs without affecting legacy.
- NEVER use `--only` (it suppresses one of the two output files)
> Source-directory split: by default `svg_to_pptx.py` reads `svg_output/` for the native pptx (preserves icon `<use>`, image `preserveAspectRatio``srcRect`, rounded rect `rx/ry``prstGeom roundRect`) and `svg_final/` for the legacy/preview pptx (PowerPoint's internal SVG parser needs the flattened form). Pass `-s output` or `-s final` only when you specifically want both products to read from a single source.
**Re-run rule**: Any change to `svg_output/` after post-processing requires re-running Steps 2-3. Step 1 only re-runs if `notes/total.md` changed.
---
## 6. Shadow & Overlay Techniques
> `<mask>` elements and `<image opacity="...">` are banned. Always use stacked `<rect>` or gradient overlays instead (see §2).
### Shadow
> **Shadow is restraint, not default.** The "designed" feel comes from absence, not abundance.
#### When to use
Only when the element genuinely floats above another layer:
- Card / quote bubble / annotation on a photo or colored panel
- Single primary CTA or "recommended" item picked out from peers
- Overlay layer (callout, tooltip, modal emphasis)
- Floating image card on a textured background
#### When NOT to use
- Background panels / dividers / decorative bars — they are the floor
- Equal peer cards in a 2/3/4-up grid — keep all flat
- Containers with visible border, gradient fill, or strong tint — redundant
- Body-text paragraph containers — disrupts scan rhythm
- Decorative lines / dividers / icons — they are symbols, not objects
- Pages with only one content container — no second layer to lift above
- Dark backgrounds — black shadows vanish; use 1px low-opacity white stroke or outer glow
**Reference — not a constraint**: 2-3 shadowed elements per page usually reads cleanest; before adding a 4th, check the extra layering earns its weight — a genuinely complex dashboard may justify more.
#### Single light source per page
All `feOffset` on a page must share the same `dx`/`dy` direction. Default: `dx="0"`, `dy="4"`-`dy="8"` (light from upper front).
#### Restraint over visibility
Standard: "the shadow is felt, not seen." If noticed, it's too strong.
- Resting cards: `flood-opacity` 0.06-0.10
- Raised elements (CTA, overlay): max `flood-opacity` 0.20
- Above 0.20 = Office 2007 hard-shadow look
- Color: near-black at low opacity, or a darker tint of background. Brand-color shadow only on accent elements sharing that hue.
#### Two-tier elevation maximum
A page may have at most two non-floor tiers.
| Tier | When | dy | stdDeviation | flood-opacity |
|------|------|----|--------------|---------------|
| Floor (no shadow) | Backgrounds, peer-grid cards, dividers, body-text containers | — | — | — |
| Resting | Cards on photos/panels, secondary callouts | 2-4 | 4-8 | 0.06-0.10 |
| Raised | Primary CTA, focused/recommended card, overlay | 6-10 | 10-16 | 0.12-0.20 |
#### Don't stack visual-weight tools
Pick **one** per container: shadow, border, gradient fill, or strong tint. Stacking = instant template look.
---
#### Filter Soft Shadow — Recommended
Best for: cards, floating panels, elevated elements. The `svg_to_pptx` converter automatically converts `feGaussianBlur` + `feOffset` into native PPTX `<a:outerShdw>`.
```xml
<defs>
<filter id="softShadow" x="-15%" y="-15%" width="140%" height="140%">
<feGaussianBlur in="SourceAlpha" stdDeviation="12"/>
<feOffset dx="0" dy="6" result="offsetBlur"/>
<feFlood flood-color="#000000" flood-opacity="0.10" result="shadowColor"/>
<feComposite in="shadowColor" in2="offsetBlur" operator="in" result="shadow"/>
<feMerge>
<feMergeNode in="shadow"/>
<feMergeNode in="SourceGraphic"/>
</feMerge>
</filter>
</defs>
<rect x="60" y="60" width="400" height="240" rx="12" fill="#FFFFFF" filter="url(#softShadow)"/>
```
Recommended parameters (see "Two-tier elevation maximum" above for tier guidance):
```
stdDeviation: 416 (resting cards: 48; raised elements: 1016)
flood-opacity: 0.060.10 (resting cards — default)
0.120.20 (raised elements only — primary CTA, overlay)
NEVER > 0.20 (Office 2007 hard-shadow look)
dy: 210 (resting: 24; raised: 610)
dx: 02 (must match every other shadow on the page — single light source)
```
#### Colored Shadow
Best for: accent buttons, brand-colored cards. Use the element's own color family instead of black.
```xml
<filter id="colorShadow" x="-15%" y="-15%" width="140%" height="140%">
<feGaussianBlur in="SourceAlpha" stdDeviation="10"/>
<feOffset dx="0" dy="6" result="offsetBlur"/>
<feFlood flood-color="#1A73E8" flood-opacity="0.20" result="shadowColor"/>
<feComposite in="shadowColor" in2="offsetBlur" operator="in" result="shadow"/>
<feMerge>
<feMergeNode in="shadow"/>
<feMergeNode in="SourceGraphic"/>
</feMerge>
</filter>
```
Replace `flood-color` with the element's brand color. Keep `flood-opacity` 0.12-0.20. Reserve for the single primary CTA per page — using on every button defeats the cue.
#### Glow Effect
Best for: title highlights, key metrics, hero text. The converter automatically converts `feGaussianBlur` without `feOffset` into native PPTX `<a:glow>`.
```xml
<defs>
<filter id="titleGlow" x="-30%" y="-30%" width="160%" height="160%">
<feGaussianBlur in="SourceAlpha" stdDeviation="6" result="blur"/>
<feFlood flood-color="#1A73E8" flood-opacity="0.45" result="glowColor"/>
<feComposite in="glowColor" in2="blur" operator="in" result="glow"/>
<feMerge>
<feMergeNode in="glow"/>
<feMergeNode in="SourceGraphic"/>
</feMerge>
</filter>
</defs>
<text x="640" y="360" text-anchor="middle" font-size="48" fill="#1A73E8" filter="url(#titleGlow)">Key Insight</text>
```
Recommended parameters:
```
stdDeviation: 48 (smaller = subtle, larger = prominent)
flood-color: brand color or accent color (NOT black)
flood-opacity: 0.350.55 (stronger than shadow for visibility)
```
**vs shadow**: no `<feOffset>` (or dx=0/dy=0). The converter uses this to distinguish glow from shadow.
#### Layered Rect Shadow — High-Compatibility Fallback
Best for: maximum compatibility with older PowerPoint versions. Stack 23 semi-transparent rectangles behind the main card:
```xml
<!-- Shadow layers (back to front, largest offset first) -->
<rect x="68" y="72" width="400" height="240" rx="16" fill="#000000" fill-opacity="0.03"/>
<rect x="65" y="69" width="400" height="240" rx="14" fill="#000000" fill-opacity="0.05"/>
<rect x="62" y="66" width="400" height="240" rx="12" fill="#1A73E8" fill-opacity="0.04"/>
<!-- Main card -->
<rect x="60" y="60" width="400" height="240" rx="12" fill="#FFFFFF"/>
```
### Image Overlay
#### Linear Gradient Overlay — Most Common
Best for: image+text pages. Gradient direction should match text position (text on left → gradient darkens toward left).
```xml
<image href="..." x="0" y="0" width="1280" height="720" preserveAspectRatio="xMidYMid slice"/>
<defs>
<linearGradient id="imgOverlay" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1A1A2E" stop-opacity="0.85"/>
<stop offset="55%" stop-color="#1A1A2E" stop-opacity="0.30"/>
<stop offset="100%" stop-color="#1A1A2E" stop-opacity="0"/>
</linearGradient>
</defs>
<rect x="0" y="0" width="1280" height="720" fill="url(#imgOverlay)"/>
```
#### Bottom Gradient Bar
Best for: cover slides and full-image pages with bottom title.
```xml
<defs>
<linearGradient id="bottomBar" x1="0" y1="0" x2="0" y2="1">
<stop offset="0%" stop-color="#000000" stop-opacity="0"/>
<stop offset="100%" stop-color="#000000" stop-opacity="0.72"/>
</linearGradient>
</defs>
<rect x="0" y="380" width="1280" height="340" fill="url(#bottomBar)"/>
```
#### Radial Gradient Overlay — Vignette Effect
Best for: full-screen atmosphere slides; draws attention to the center.
```xml
<defs>
<radialGradient id="vignette" cx="50%" cy="50%" r="70%">
<stop offset="0%" stop-color="#000000" stop-opacity="0"/>
<stop offset="100%" stop-color="#000000" stop-opacity="0.58"/>
</radialGradient>
</defs>
<rect x="0" y="0" width="1280" height="720" fill="url(#vignette)"/>
```
#### Brand Color Overlay
Best for: slides needing strong visual brand identity.
```xml
<defs>
<linearGradient id="brandOverlay" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#005587" stop-opacity="0.80"/>
<stop offset="100%" stop-color="#005587" stop-opacity="0.10"/>
</linearGradient>
</defs>
<rect x="0" y="0" width="1280" height="720" fill="url(#brandOverlay)"/>
```
### Quick-Reference Table
| Scenario | Recommended Technique | Avoid |
|----------|-----------------------|-------|
| Card / panel shadow (only when floating over photo/colored panel) | Filter soft shadow (`flood-opacity` 0.060.10, single light source) | Hard black shadow, full-page abundance |
| Equal peer cards in a grid | All flat (no shadow) | Lifting every card uniformly |
| Page-section background panel | Flat fill, no shadow | Treating panels as floating cards |
| Accent / CTA button (one per page) | Colored shadow (same hue family, `flood-opacity` 0.120.20) | Generic gray shadow, applying to every button |
| Title / metric highlight | Glow filter (brand color, no offset) | Overuse on body text |
| Text over image | Linear gradient overlay (direction matches text side) | Uniform flat opacity over whole image |
| Cover / full-image slide | Bottom gradient bar + brand color | Solid black overlay |
| Atmosphere / hero slide | Radial vignette | Unprocessed raw image |
| Max PPT compatibility needed | Layered rect shadow | Filter-based shadow |
---
## 7. Stroke, Text & Shape Effects
### stroke-dasharray — Dashed / Dotted Lines
Converts to native PPTX `<a:prstDash>`. Use preset patterns for best results:
| SVG Value | PPTX Preset | Best For |
|-----------|-------------|----------|
| `4,4` | Dash | General dashed lines, separators |
| `2,2` | Dot (sysDot) | Subtle dotted borders, placeholder outlines |
| `8,4` | Long dash | Timeline connectors, flow arrows |
| `8,4,2,4` | Long dash-dot | Technical drawings, dimension lines |
```xml
<rect x="60" y="60" width="400" height="240" rx="12"
fill="none" stroke="#999999" stroke-width="2" stroke-dasharray="4,4"/>
<line x1="100" y1="360" x2="1180" y2="360"
stroke="#CCCCCC" stroke-width="1" stroke-dasharray="2,2"/>
```
### stroke-linejoin
Controls how line segments join at corners. Supported values convert to native PPTX line join types:
| SVG Value | PPTX Equivalent | Best For |
|-----------|-----------------|----------|
| `round` | Round join | Smooth polyline charts, organic shapes |
| `bevel` | Bevel join | Technical diagrams |
| `miter` | Miter join (default) | Sharp-cornered rectangles, arrows |
```xml
<polyline points="100,200 200,100 300,200" fill="none"
stroke="#1A73E8" stroke-width="3" stroke-linejoin="round"/>
```
### text-decoration
Supported text decorations convert to native PPTX text formatting:
| SVG Value | PPTX Equivalent | Best For |
|-----------|-----------------|----------|
| `underline` | Single underline | Emphasis, links, key terms |
| `line-through` | Strikethrough | Removed items, before/after comparisons |
```xml
<text x="100" y="200" font-size="20" fill="#333333" text-decoration="underline">Important Term</text>
<!-- Per-tspan decoration -->
<text x="100" y="240" font-size="18" fill="#333333">
Regular text <tspan text-decoration="line-through" fill="#999999">old value</tspan> new value
</text>
```
### Gradient Fill — linearGradient & radialGradient
Gradients defined in `<defs>` and referenced via `fill="url(#id)"` convert to native PPTX `<a:gradFill>`. Use them as shape fills (not just overlays) for polished surfaces.
**Linear gradient** — best for buttons, header bars, background panels:
```xml
<defs>
<linearGradient id="btnGrad" x1="0" y1="0" x2="1" y2="0">
<stop offset="0%" stop-color="#1A73E8"/>
<stop offset="100%" stop-color="#0D47A1"/>
</linearGradient>
</defs>
<rect x="540" y="600" width="200" height="48" rx="24" fill="url(#btnGrad)"/>
```
**Radial gradient** — best for spotlight backgrounds, circular accents:
```xml
<defs>
<radialGradient id="spotBg" cx="50%" cy="50%" r="70%">
<stop offset="0%" stop-color="#1A73E8" stop-opacity="0.15"/>
<stop offset="100%" stop-color="#1A73E8" stop-opacity="0"/>
</radialGradient>
</defs>
<circle cx="640" cy="360" r="300" fill="url(#spotBg)"/>
```
### Pattern Fill — `<pattern>` with PPTX preset annotation
`<pattern>` fills convert to native PPTX `<a:pattFill prst="...">` — but only PPTX's built-in preset patterns are reachable. The converter does **not** render hand-drawn `<path>` geometry inside the pattern; instead it reads two annotations off the `<pattern>` element and emits the matching DrawingML preset.
**Prefer explicit geometry when spacing matters.** A `<pattern>` renders at PowerPoint's **fixed preset density** — you cannot reproduce a specific tile size (e.g. a 40px grid). For grids / textures whose spacing or line weight is part of the design, draw the lines as **one `<path>` with all lines as subpaths** (`M40 0V720 M80 0V720 … M0 40H1280 …`, `fill="none" stroke=…`) — the converter supports `M/L/H/V` and multi-subpath, so it becomes **one editable vector shape that reproduces the exact spacing** across all four renderers. Reserve `<pattern>` + `data-pptx-pattern` for **round-tripping an existing PPTX** (decks imported via `pptx_to_svg`), where the source genuinely used a native preset fill. For pure display where no PPT-side editing is needed, `--svg-snapshot` is the other faithful option.
**Required annotations** (only when you intentionally use a `<pattern>` preset):
| Attribute | Purpose | Without it |
|---|---|---|
| `data-pptx-pattern="<preset>"` | Names the PPTX preset (one of the enum below) | Falls back to `ltUpDiag` — diagonal stripes, not your geometry |
| Child `<rect fill="<bg-hex>"/>` | Background color of the pattern tile | `bg` falls back to `#FFFFFF`, painting over the page background |
The child `<path>`'s `stroke` becomes the foreground color (the pattern's line color).
```xml
<defs>
<pattern id="bpGrid" x="0" y="0" width="40" height="40"
patternUnits="userSpaceOnUse" data-pptx-pattern="lgGrid">
<rect width="40" height="40" fill="#0E2A47"/>
<path d="M 40 0 L 0 0 0 40" fill="none" stroke="#2D4A6B" stroke-width="0.6"/>
</pattern>
</defs>
<rect width="1280" height="720" fill="url(#bpGrid)"/>
```
**Valid `data-pptx-pattern` values** (OOXML `ST_PresetPatternVal` — closed enum, anything outside makes PowerPoint open with "needs to be repaired"):
| Category | Values |
|---|---|
| Grids | `smGrid` · `lgGrid` · `dotGrid` *(no `ltGrid` — common typo)* |
| Diagonal lines | `ltUpDiag` · `ltDnDiag` · `dkUpDiag` · `dkDnDiag` · `wdUpDiag` · `wdDnDiag` · `dashUpDiag` · `dashDnDiag` · `diagCross` |
| Horizontal / vertical lines | `horz` · `vert` · `ltHorz` · `ltVert` · `dkHorz` · `dkVert` · `narHorz` · `narVert` · `dashHorz` · `dashVert` · `cross` |
| Percent fills | `pct5` · `pct10` · `pct20` · `pct25` · `pct30` · `pct40` · `pct50` · `pct60` · `pct70` · `pct75` · `pct80` · `pct90` |
| Checks & confetti | `smCheck` · `lgCheck` · `smConfetti` · `lgConfetti` |
| Decorative | `horzBrick` · `diagBrick` · `weave` · `plaid` · `trellis` · `zigZag` · `wave` · `sphere` · `divot` · `shingle` · `solidDmnd` · `openDmnd` · `dotDmnd` |
> `svg_quality_checker.py` warns on missing `data-pptx-pattern` and errors on values outside the enum. Catch these pre-export — PowerPoint's repair dialog hides which pattern broke.
### transform: rotate — Element Rotation
Rotation converts to native PPTX `<a:xfrm rot="...">`. Supported on all element types: `rect`, `circle`, `ellipse`, `line`, `path`, `polygon`, `polyline`, `image`, and `text`.
```xml
<!-- Rotated decorative element -->
<rect x="100" y="100" width="60" height="60" fill="#1A73E8" fill-opacity="0.1"
transform="rotate(45, 130, 130)"/>
<!-- Rotated text label -->
<text x="50" y="400" font-size="14" fill="#999999"
transform="rotate(-90, 50, 400)">Y-Axis Label</text>
```
**Syntax**: `rotate(angle)` or `rotate(angle, cx, cy)` where `cx,cy` is the rotation center. Positive angles rotate clockwise.
### Arc Paths — Donut / Pie Charts
Calculate arc endpoint coordinates precisely with trigonometry. Never estimate — small errors produce wildly wrong shapes.
**Calculation formula** (center `cx,cy`, radius `r`, angle `θ` in degrees):
```
x = cx + r × cos(θ × π / 180)
y = cy + r × sin(θ × π / 180)
```
**Key rules**:
1. Start at **-90°** (12 o'clock position) and go clockwise
2. Each sector spans `percentage × 360°`
3. Use **large-arc flag = 1** when the sector is > 180°, **0** otherwise
4. sweep-direction = 1 (clockwise) for outer arc, 0 (counter-clockwise) for inner arc returning
5. **Always verify** that the sum of all sector angles equals 360° and that the last sector's end point matches the first sector's start point
**Example — 75% donut sector** (center 400,400, outer r=180, inner r=100):
```
Start angle: -90° → outer(400, 220), inner(400, 300)
End angle: -90+270=180° → outer(220, 400), inner(300, 400)
Large-arc flag: 1 (270° > 180°)
<path d="M 400,220 A 180,180 0 1,1 220,400 L 300,400 A 100,100 0 1,0 400,300 Z"/>
```
### Polygon Arrows on Diagonal Lines
> For connector lines prefer `marker-end`/`marker-start` (§1.1). For chunky/wide solid/non-connector arrows, use standalone polygon or path.
Horizontal/vertical lines can use simple point offsets for `<polygon>` arrowheads. Diagonal lines need triangle vertices rotated to match line direction.
**Method** — calculate triangle points using the line's direction vector:
```
Given line from (x1,y1) to (x2,y2):
1. Direction vector: dx = x2-x1, dy = y2-y1
2. Normalize: len = √(dx²+dy²), ux = dx/len, uy = dy/len
3. Perpendicular: px = -uy, py = ux
4. Arrow tip = (x2, y2)
5. Back point 1 = (x2 - ux×12 + px×5, y2 - uy×12 + py×5)
6. Back point 2 = (x2 - ux×12 - px×5, y2 - uy×12 - py×5)
```
**Example — diagonal line** from (260,310) to (370,430):
```
dx=110, dy=120, len≈162.8, ux=0.676, uy=0.737
px=-0.737, py=0.676
Tip: (370, 430)
Back1: (370-8.1-3.7, 430-8.8+3.4) = (358.2, 424.6)
Back2: (370-8.1+3.7, 430-8.8-3.4) = (365.6, 417.8)
<polygon points="370,430 365.6,417.8 358.2,424.6" fill="#C8A96E"/>
```
⚠️ Never use a fixed downward/rightward triangle on a diagonal line — arrow will point wrong.
---
## 8. Project Directory Structure
```
project/
├── svg_output/ # Raw SVGs (Executor output, contains placeholders)
├── svg_final/ # Post-processed final SVGs (finalize_svg.py output)
├── images/ # Image assets (user-provided + AI-generated)
├── notes/ # Speaker notes (.md files matching SVG names)
│ └── total.md # Complete speaker notes document (before splitting)
├── templates/ # Project templates (if any)
└── *.pptx # Exported PPT file
```

Some files were not shown because too many files have changed in this diff Show More