feat(run_python): 过程脚本约定落 task_dir/scripts/

模型显式写文件再 script_path 跑的过程脚本统一进 <task_dir>/scripts/ (可见/持久/可重跑),交付产物仍落 task_dir 根。inline code 匿名片段维持临时用后即焚(host 系统 temp、docker .zcbot_tmp dotfile,均不动)。改 agent_builder 系统提示工作目录段 + run_python tool description/参数说明。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 15:52:05 +08:00 · 2026-06-05 15:52:05 +08:00 · 69fc2599e3
parent 02cd67f63a
commit 69fc2599e3
3 changed files with 13 additions and 5 deletions
--- a/PROGRESS.md
+++ b/PROGRESS.md
@ -23,6 +23,7 @@
 ### 2026-06-05
 - **run_python 过程脚本约定 `<task_dir>/scripts/`**:确认现状是模型生成的 `.py` 直接落 task_dir 根(系统提示只说"只写到 task_dir",无 scripts/ 分层),过程脚本和交付产物(.docx/.pptx/spec)混在一起。定调:**模型显式写文件再 `script_path` 跑的过程脚本** → `<task_dir>/scripts/`(可见/持久/可重跑,`WriteTool` 自动建父目录);**inline `code` 匿名片段** → 维持临时用后即焚(host 走系统 temp、docker 走 `.zcbot_tmp/<task_id>/` dotfile 隐藏+删,均不动)——不持久化到 scripts/,免把目录污染成匿名垃圾堆。改 `core/agent_builder.py` 系统提示工作目录段加一条 scripts/ 引导(>~15 行/要迭代/出产物用文件,短抛弃代码才内联)+ `tools/run_python.py` 的 tool description / `script_path` 参数说明同步。inline 执行逻辑两后端均未改。tests `test_run_python_script_path` / `test_executor_docker` 全过(2 skip 为 Linux-only)。
 - **新增 `standard` skill(国标/行标/团标起草)**:联网核实市面无可直接复用的"写标准文件本身"的 skill(搜到的 technical-proposal GB/T 8567、official-document GB/T 9704 都是相邻品类——投标书/公文,非标准),据 GB/T 1.1—2020 自建。覆盖三层级(国标 GB·T / 行标 JC·T / 团标 T/,重点对接 **CSTM → T/CSTM**,材料试验团标对口建材院检测方向)× 两体裁骨架(试验方法 GB/T 20001.4 + 产品标准)。文件:`SKILL.md`(阶段化:定层级体裁→八条 spec→逐章段段卡→自检渲染)+ 3 references(`gbt_1_1_structure` 要素骨架/必备可选/规范性资料性/封面前言套话、`standard_levels` 选型+CSTM 体系立项、`drafting_rules` 能愿动词应宜可能/不可考核词过滤/指标量化闭环/术语规则/引用真实性+§8 自检清单)+ 4 templates(spec/test_method/product_standard/编制说明)。**渲染复用 proposal `render_docx.py`+`render_diagrams.py`**(兄弟 skill `../proposal/scripts/`,同 patent 范式);冒烟测过表格/中文渲染正常。**坑**:proposal `quality_check.py` 按申报书固定章节名查"缺章节",对标准全是误报且无跳过开关→阶段三不用机检,改 drafting_rules §8 人工 12 条清单(与 patent self_check 同思路)。产出是结构合规草稿 docx,正式报批再灌官方 TCS/CSTM 模板做版式精修。
 - **dev 页加"改密码"功能 + 文件面板"选入"按钮文字改图标(防换行)**:① 自助改密码——`web/auth.py::change_password(user_id, old, new)`(验旧密码 → 新密码 ≥6 → bcrypt 重哈希写回,错误归一到现成 `UserCreateError` code 体系 `wrong_password/no_password/weak_password/user_not_found`,不为此新开异常类),`POST /v1/auth/change_password` 挂 `Depends(require_user)`(user_id 取自 JWT 不信前端,旧密码错/无密码→403、弱→400、行没了→401)。前端顶栏「退出登录」左侧加「改密码」按钮(`#hd-chpw`,并入 embed 隐藏规则——embed 模式不显示)+ 一个复用 `.modal` 骨架的弹框(旧/新/确认三项,前端先验长度+两次一致再提交,成功 `alert` 提示不登出,401 走 `logout()`)。否决"点用户名展开菜单"(多写菜单逻辑不划算)。② `#btn-src-pick` 的文字 `选入…` 改单字符图标 `⊕`(和旁边 `⬆ ↻ ›` 同款单色字形,`title` 保留"选入"语义)——原中文文字在窄面板偶发换行。
 - **记账给 DeepSeek 前缀缓存命中折价(修虚高 ~2-3x)+ 前端体现缓存命中/真实成本**:排查"rust 优势→PPT"那 task(flash,34 轮)发现 `tokens_in` 累计 69.9 万里 **88.6% 是缓存命中**,但 `usage.py::_fallback_chat_cost_cny` 把命中段也按 `input` 全价(1.0)算 → 记 ¥0.84,真实(命中按 0.1x)只 ~¥0.28,**越大的 task 虚高越多**(文献采集 53% 命中:¥33→~¥16)。修:① `ModelCapabilities` 加 `cache_hit_cny_per_mtoken`(deepseek flash 0.1 / pro 0.2;0=不区分按全价兜底,绝不少记);② 成本公式拆三段「命中×缓存价 + (input−命中)×input价 + output×output价」,`loop.py` 把 `cache_hit_tokens` + 缓存单价透传进 `record_chat_usage`;③ 前端不加 DB 列——`web/app.py` 加 `_usage_aggregates`(单查询 GROUP BY `usage_events`,复用列表 `msg_counts` 同款批量范式,无 N+1)on-the-fly 算每 task 真实成本 + chat token + 缓存命中,`_task_dict` 带出;列表行**不内联花费**、只显 tok 数,花费/缓存命中率藏 hover tooltip(`taskUsageTooltip`,多行:输入/输出拆分 · 命中 + 命中率 · ¥真实花费),顶栏额外内联简版。**折价只对新 chat 事件生效**,历史走 backfill 脚本(`scripts/backfill_chat_cost_cache_discount.py`,默认 dry-run,`--apply` 落库;`--assume-cache-hit-rate RATE` 给无 `cache_hit_tokens` 字段的老事件按估算命中率折价——DeepSeek 当时缓存了只是没记,全价偏高;实测过的事件用真实值不受影响)。**坑修**:命中率分母原误用 `tasks.tokens_prompt`,但该列会被「清空对话」重置而 `usage_events` 不重置 → 跨源相除算出 822% 怪值;改为 `_task_dict` 的 token 总量也优先取 usage_events 聚合(与 cache_hit 同源,命中率恒 ≤100%)。**注**:真正压低 token 体量的杠杆是减少轮数(高成本 task 全是 100+ 轮的逐步 write/run_python 循环),非本次范围。
--- a/core/agent_builder.py
+++ b/core/agent_builder.py
@ -259,6 +259,11 @@ def _build_system_prompt(
        f"普通产物(sections / slides / 终稿 .docx/.pptx)按 SKILL 文档落路径;"
        f"「宪法」性文件(spec 等)按下面《task 级「宪法」文件命名约定》拼路径。\n"
        f"⛔ 不要把产物写到 cwd / `skills/` / repo 根 —— 只写到 task_dir。\n"
        f"\n**run_python 过程脚本**:非平凡的 Python(>~15 行 / 要迭代调试 / 生成产物)"
        f"先用写文件工具落到 `<task_dir>/scripts/`(如 `scripts/analyze.py`,父目录自动建),"
        f"再用 `run_python(script_path=\"scripts/analyze.py\")` 执行 —— 源码留在文件里、可重读可改可重跑,"
        f"不挤占对话上下文。`scripts/` 只放过程脚本,**交付产物(.docx/.pptx/spec/figures 等)仍落 task_dir 根或 SKILL 指定路径**。"
        f"真·一次性短代码(算个数 / 探查一行)才用 `run_python(code=...)` 内联,临时执行不留痕。\n"
        f"\n## task 级「宪法」文件命名约定(跨 skill 通用)\n"
        f"任何 skill 产物中,跟 task 1:1 强绑定、阶段二/后续步骤会**反复 read**"
        f"的「宪法」性文件(如 proposal/ppt 的 spec、outline 等),**统一按下面格式命名**,"
--- a/tools/run_python.py
+++ b/tools/run_python.py
@ -23,11 +23,13 @@ class RunPythonTool(Tool):
    name = "run_python"
    description = (
        "Execute Python code in a subprocess. Returns stdout/stderr/exit_code.\n"
-        "Use script_path for non-trivial code: write a .py file first, then execute it "
+        "Use script_path for non-trivial code: write the .py under scripts/ (e.g. scripts/analyze.py) first, "
-        "so the full source stays in files instead of conversation history.\n"
+        "then execute it so the full source stays in files instead of conversation history.\n"
-        "Use inline code only for short snippets. Good for: data analysis, batch file ops, document generation (.pptx/.docx), "
+        "Use inline code only for short throwaway snippets (a quick calc / one-line probe) — these run from a temp file and leave no trace.\n"
        "Good for: data analysis, batch file ops, document generation (.pptx/.docx), "
        "matplotlib charts, or any task where Python is more natural than chaining tools.\n"
-        "Working directory is the agent's base dir. Files you create persist there.\n"
+        "Working directory is the agent's base dir (task_dir); relative paths resolve against it. "
        "Keep process scripts in scripts/; write deliverables to task_dir root or the SKILL-specified path.\n"
        "Available libs (install with shell pip if missing): "
        "pandas, numpy, matplotlib, python-pptx, python-docx, requests, pypdf."
    )
@ -40,7 +42,7 @@ class RunPythonTool(Tool):
            },
            "script_path": {
                "type": "string",
-                "description": "Path to an existing .py file to execute. Prefer this for non-trivial code.",
+                "description": "Path to an existing .py file to execute (relative to task_dir). Prefer this for non-trivial code; keep such scripts under scripts/.",
            },
            "timeout": {"type": "integer", "default": 120, "description": "Seconds before kill"},
        },