Stage C Step 3: DockerExecutor 集成 AgentLoop + web lifespan reaper
- core/executor_docker.py 新增 DockerExecutor:组合 HostExecutor+SandboxPool, shell/run_python 走 docker exec(setsid + --user 1000:1000 + --workdir), 其他工具直通 host(§7.5 #6 信任域二分) - run_python tmp .py 落 <user_root>/.zcbot_tmp/<task_id>/(dotfile,/v1/files 天然过滤),容器内对应 /workspace/.zcbot_tmp/...,跑完 unlink - ZCBOT_SANDBOX_BACKEND=host|docker env 切 backend,默 host(Windows dogfood 零变化);docker 路径 pool 未 init → fail-fast 不静默退化 - web/app.py lifespan:docker backend 启动时 init_pool + shutdown_all 清孤儿 + 60s 后台 reaper(run_in_executor 调 sync reap_idle);关闭时 cancel + 兜底清 - pool.py 顺手清 Step 2 债:asyncio.Lock → threading.Lock,ensure 改同步 (主使用方是 BG 线程 tool call,ephemeral loop 会让 asyncio.Lock 跨锁失效) - Cancel limitation 接受:Popen.kill() 仅杀 docker CLI 客户端,容器内进程靠 idle 5min reaper 兜底;升级到 PGID 协议(§7.5 #3)等用户反馈触发 - tests/test_executor_docker.py 11 测试覆盖关键路径(host 直通/argv 形态/ tmp 清理/timeout/cancel/未知工具/enable_run_python=False) - DESIGN.md 不动(纯按 §7.5 #5 #6 既有协议实施) - RUN.md 加 ZCBOT_SANDBOX_BACKEND env 段 + 切 docker 的前置条件 + 集成验证路径 - unittest discover 12/12 PASS Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
f66511ccf8
commit
dfac0acfa6
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。
|
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。
|
||||||
|
|
||||||
最后更新:2026-05-26(Stage C Step 2:Docker per-user 容器池 + Dockerfile / init.sh / network ensure,代码就绪未集成 AgentLoop)
|
最后更新:2026-05-26(Stage C Step 3:DockerExecutor 集成 AgentLoop + web lifespan reaper,ZCBOT_SANDBOX_BACKEND env 切换 host/docker)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -15,7 +15,7 @@
|
||||||
| 5 | Eval Suite | ⏸ 不做 | dogfooding 替代,probe 覆盖健康检查 |
|
| 5 | Eval Suite | ⏸ 不做 | dogfooding 替代,probe 覆盖健康检查 |
|
||||||
| 6 | 长任务工程化 | 🟡 | task + 恢复 ✅;双层记忆 ✅;context 压缩未做 |
|
| 6 | 长任务工程化 | 🟡 | task + 恢复 ✅;双层记忆 ✅;context 压缩未做 |
|
||||||
| 7 | 打磨 | ❌ | Docker 沙盒 / 更多 skill |
|
| 7 | 打磨 | ❌ | Docker 沙盒 / 更多 skill |
|
||||||
| §7 SaaS | DESIGN §7 路线 | 🟡 | A 事件流化 ✅;B 完工 ✅;D `/v1` JSON API ✅;D' 过渡 auth + dev SPA ✅;单活 run 锁 + cancel ✅;0004 schema 瘦身 ✅;入口归位 ✅;真 OIDC 待;**C(Executor+docker sandbox)待 —— 外部用户开放 hard prereq,完成前仅 dogfood + 信任同事白名单;DoD 详 DESIGN §7.5 落地清单 6 条**。 |
|
| §7 SaaS | DESIGN §7 路线 | 🟡 | A 事件流化 ✅;B 完工 ✅;D `/v1` JSON API ✅;D' 过渡 auth + dev SPA ✅;单活 run 锁 + cancel ✅;0004 schema 瘦身 ✅;入口归位 ✅;真 OIDC 待;**C Step 1-3 ✅(Executor 接口 + Docker 池 + DockerExecutor 集成 AgentLoop,`ZCBOT_SANDBOX_BACKEND=docker` 切容器)**;Step 4 egress proxy + Step 3b PGID kill 协议待;**外部用户开放仍需 egress proxy + xfs project quota(§7.5 落地清单 #2 #4)**。 |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -23,6 +23,7 @@
|
||||||
|
|
||||||
### 2026-05-26
|
### 2026-05-26
|
||||||
|
|
||||||
|
- **Stage C Step 3:DockerExecutor 集成 AgentLoop + web lifespan(`ZCBOT_SANDBOX_BACKEND=host|docker` env 切 backend)**:`core/executor_docker.py` `DockerExecutor` 组合 `HostExecutor` + `SandboxPool`,`call_tool` 按 §7.5 #6 信任域 dispatch:`shell` / `run_python` → `pool.ensure(user_id)` 拿容器名 + `docker exec --user 1000:1000 --workdir /workspace/<wd_name> -e PYTHONIOENCODING=utf-8 setsid bash -c <cmd>` / `python <script>`(`setsid` 走包一层进程组,§7.5 #3 PGID kill 协议留 Step 3b 启用);其他工具(read/write/edit/glob/grep/load_skill/web_*/seedream/seedance)直通 host。**run_python tmp .py 落 host 侧 `<user_root>/.zcbot_tmp/<task_id>/<rand>.py`**,容器内对应 `/workspace/.zcbot_tmp/<task_id>/<rand>.py`(bind mount 自动可见);dotfile 起头让 `/v1/files` API 天然过滤(`web/app.py:169` `startswith(".")` 已挡)。**Cancel limitation 接受**:Popen.kill() 杀 docker CLI 客户端,容器内 server 端进程不会因此终止(docker exec 设计如此);第一版靠 idle 5min reaper / 下次 `ensure` 时 `rm -f` 兜底,升级触发为"用户报取消但还在烧 CPU"。`core/sandbox/__init__.py` 暴露 module-level singleton `init_pool` / `get_pool`,`agent_builder._resolve_executor` 按 env 切 backend、docker 路径 pool 未初始化 → fail-fast(不静默退到 host 防止"以为有沙盒实则在裸跑"误判);`web/app.py` lifespan 启动钩子:`init_pool(workspace/users)` + `shutdown_all` 清前驱孤儿 + `asyncio.create_task(_reaper)`(每 60s `run_in_executor(pool.reap_idle)`),关闭钩子 cancel reaper + `shutdown_all`。**pool.py 顺手清债**:`asyncio.Lock` → `threading.Lock`(主使用方是 web BG 线程同步 tool call,asyncio.Lock 会被每次 `asyncio.run` 起的 ephemeral loop 绕过保护;reaper 改 async wrapper `loop.run_in_executor(pool.reap_idle)`,pool API 全 sync 更直)。**测试**:`tests/test_executor_docker.py` 11 测试覆盖 host 直通 / shell argv 形态 / run_python tmp 文件清理 / timeout / cancel / 未知工具 / caps.enable_run_python=False;`unittest discover -s tests` **12/12 PASS**(原 1 测试不变,新 11 测试加上)。**Windows dogfood 零变化**:默 `ZCBOT_SANDBOX_BACKEND=host`,本地不动 docker;切 docker 路径只在 Ubuntu 部署机有效,真起容器 smoke 仍按 RUN.md "Sandbox(Stage C,Ubuntu)" 段 5 条命令在部署机跑。`DESIGN.md` **不动**(纯按 §7.5 #5 #6 既有协议实施);`RUN.md` 加 `ZCBOT_SANDBOX_BACKEND` env 说明 + 切 docker backend 时的启动前置条件。否决:(a) DockerExecutor 用 `asyncio.run(pool.ensure)` 包 ephemeral loop —— 跨 loop 不共享 asyncio.Lock,失串行化保护,且每次 tool call 多 ~5ms loop 创建销毁噪声;改 pool 同步成本更低;(b) `run_python` tmp .py 放工作目录内 —— 污染用户视野,SKILL 教模型"列工作目录用 glob"时 tmp 文件干扰,crash 残留与产物混(详 §7.9 取舍记录会在下次有同款问题时考虑沉淀);(c) host 侧独立 bind mount `<workspace>/.sandbox_tmp/<uid>/` 挂成容器 `/tmp_scripts` —— 多挂一个 mount 复杂度上升,单 bind mount 协议保持更直;(d) docker backend 失败时退化到 host —— 沙盒缺失=安全模型崩,fail-fast 比"看起来在跑"重要,§7.5 硬协议"任一缺失视为部署未完成"。
|
||||||
- **Stage C Step 2:Docker per-user 容器 + iptables blocklist(§7.5 #1 + #3 落地基底,未接入 AgentLoop)**:`deploy/sandbox/Dockerfile`(python:3.11-slim + tini PID 1 + iptables/iproute2/netbase + non-root user uid `HOST_UID` build-arg + 全套 requirements.txt 装到容器内)+ `deploy/sandbox/init.sh`(`set -euo pipefail`,任一 iptables 规则失败 fail-fast → 容器终止,符合 §7.5 #1"任一缺失视为 Stage C 未完成"硬协议;6 段 IPv4 红线 + ::1 IPv6 loopback 降级容忍 + `ZCBOT_PG_IPS` env 逐 IP DROP;`exec sleep infinity` 等 `docker exec` 进来)。`core/sandbox/network.py` 单函数 `ensure_network()`,`docker network create --internal zcbot-sandbox-net`(默认无 outbound + 跨容器隔离,Step 4 加 proxy 时 proxy 同接此网络);`core/sandbox/pool.py` `SandboxPool` 类持 per-user `asyncio.Lock` + in-memory `_last_active` dict —— ensure 路径 inspect 探测 → running 直接返 / exists-but-stopped `rm -f` 重起(保 iptables 重新 apply)/ 不存在 `docker run` 装齐 hardening flags(`--read-only --tmpfs /tmp:exec --cap-drop=ALL --cap-add=NET_ADMIN --security-opt=no-new-privileges --pids-limit=256 --memory=2g --cpus=1.0` + bind mount user_root → `/workspace` + label `zcbot.product=sandbox` 给批量清扫用 + `--restart=no`);`mark_active` 更新 dict / `reap_idle` 按 ttl 杀 / `shutdown_all` 杀 label 全集(app 启动清前驱孤儿用)。容器命名 `zcbot-sandbox-<user_id>`(UUID 标准串带 dash,与 mount 路径 `<workspace>/users/<user_id>/` 视觉对齐 ── `docker ps | grep zcbot-sandbox-` 直接看活跃 user)。**关键决策**:(a) **docker CLI via subprocess 而非 docker-py SDK** ── §7.5 #5 "接口形状不泄漏 Docker 假设"对应到实现层,subprocess 行为透明、零新依赖、`docker ps` 实地对账;(b) **`docker update --label-add` 不可用 → 用 in-memory dict** ── Docker 23+ 移除 runtime label 修改,所以 last_active 落 Python dict;app 重启 dict 空 → 历史孤儿由 `shutdown_all` 兜底清(lifespan 启动钩子里调);(c) **`--internal` 网络从 Step 2 即生效** ── iptables OUTPUT 规则作为 defense-in-depth(网络层已堵死 outbound,iptables 仍按协议加规则);Step 4 加 proxy 时 proxy 容器同接 `zcbot-sandbox-net`,加 iptables ACCEPT 例外 + 改默认 DROP 实现"默认 deny + 仅经 proxy";(d) **NET_ADMIN cap 留给 PID 1 root 跑 iptables** ── 容器整生命周期持 NET_ADMIN,但 PID 1 `sleep infinity` 不接外部输入,`docker exec` 进来由 `--user 1000:1000` 锁 non-root + 空 cap_effective,等同于无 NET_ADMIN。Step 3 DockerExecutor 必须硬编 --user 1000 不让 root 路径打开(代码 review 守住)。**Step 2 范围明确不包含**:① AgentLoop 集成(`agent_builder.py` 不动 ── pool 是孤立模块,Step 3 才插)② shell/run_python 切到容器 ③ egress proxy(Step 4)④ reaper 后台 task(Step 3 接入 web lifespan 时一起加)。**验证**:`from core.sandbox import ...` 全套导入 + ctor 通过;`SandboxPool(user_root_base=Path(...), pg_ips='10.x,172.x')` 字段正确;`unittest discover` 1/1 PASS。docker 真起容器验证在 Ubuntu 上跑(RUN.md "Sandbox(Stage C,Ubuntu)" 段写了 5 条 smoke 命令:build / iptables 段 / non-root uid / read-only / 销毁)。`DESIGN.md` 不动(纯按 §7.5 #1 #3 既有协议实施);`RUN.md` 加 "Sandbox(Stage C,Ubuntu)" 部署段(镜像构建 / sandbox env / 5 条验证命令 / xfs project quota 升级时点)+ 故障兜底加 2 条(uid 错配 EACCES / NET_ADMIN 缺失)。否决:(a) 容器名用 sha256(uid)[:12] + label 反查 —— 每次 exec 多一次 `docker ps --filter` round-trip,可读性损失,隐私收益 0;(b) per-task 容器 —— DESIGN §7.5 已锁 per-user 共享心智模型(同 user 多 task 共享素材),不重开;(c) 用 docker `init container` 范式做 iptables —— Docker 没原生支持(那是 k8s),compose v2 同步又增复杂度,NET_ADMIN + 非 root exec 范式更直接;(d) Step 2 立即接入 AgentLoop —— 接了不能 dogfood(本地 Windows 无 docker),反而污染 host 路径;pool 孤立 commit 留 Step 3 一起接。
|
- **Stage C Step 2:Docker per-user 容器 + iptables blocklist(§7.5 #1 + #3 落地基底,未接入 AgentLoop)**:`deploy/sandbox/Dockerfile`(python:3.11-slim + tini PID 1 + iptables/iproute2/netbase + non-root user uid `HOST_UID` build-arg + 全套 requirements.txt 装到容器内)+ `deploy/sandbox/init.sh`(`set -euo pipefail`,任一 iptables 规则失败 fail-fast → 容器终止,符合 §7.5 #1"任一缺失视为 Stage C 未完成"硬协议;6 段 IPv4 红线 + ::1 IPv6 loopback 降级容忍 + `ZCBOT_PG_IPS` env 逐 IP DROP;`exec sleep infinity` 等 `docker exec` 进来)。`core/sandbox/network.py` 单函数 `ensure_network()`,`docker network create --internal zcbot-sandbox-net`(默认无 outbound + 跨容器隔离,Step 4 加 proxy 时 proxy 同接此网络);`core/sandbox/pool.py` `SandboxPool` 类持 per-user `asyncio.Lock` + in-memory `_last_active` dict —— ensure 路径 inspect 探测 → running 直接返 / exists-but-stopped `rm -f` 重起(保 iptables 重新 apply)/ 不存在 `docker run` 装齐 hardening flags(`--read-only --tmpfs /tmp:exec --cap-drop=ALL --cap-add=NET_ADMIN --security-opt=no-new-privileges --pids-limit=256 --memory=2g --cpus=1.0` + bind mount user_root → `/workspace` + label `zcbot.product=sandbox` 给批量清扫用 + `--restart=no`);`mark_active` 更新 dict / `reap_idle` 按 ttl 杀 / `shutdown_all` 杀 label 全集(app 启动清前驱孤儿用)。容器命名 `zcbot-sandbox-<user_id>`(UUID 标准串带 dash,与 mount 路径 `<workspace>/users/<user_id>/` 视觉对齐 ── `docker ps | grep zcbot-sandbox-` 直接看活跃 user)。**关键决策**:(a) **docker CLI via subprocess 而非 docker-py SDK** ── §7.5 #5 "接口形状不泄漏 Docker 假设"对应到实现层,subprocess 行为透明、零新依赖、`docker ps` 实地对账;(b) **`docker update --label-add` 不可用 → 用 in-memory dict** ── Docker 23+ 移除 runtime label 修改,所以 last_active 落 Python dict;app 重启 dict 空 → 历史孤儿由 `shutdown_all` 兜底清(lifespan 启动钩子里调);(c) **`--internal` 网络从 Step 2 即生效** ── iptables OUTPUT 规则作为 defense-in-depth(网络层已堵死 outbound,iptables 仍按协议加规则);Step 4 加 proxy 时 proxy 容器同接 `zcbot-sandbox-net`,加 iptables ACCEPT 例外 + 改默认 DROP 实现"默认 deny + 仅经 proxy";(d) **NET_ADMIN cap 留给 PID 1 root 跑 iptables** ── 容器整生命周期持 NET_ADMIN,但 PID 1 `sleep infinity` 不接外部输入,`docker exec` 进来由 `--user 1000:1000` 锁 non-root + 空 cap_effective,等同于无 NET_ADMIN。Step 3 DockerExecutor 必须硬编 --user 1000 不让 root 路径打开(代码 review 守住)。**Step 2 范围明确不包含**:① AgentLoop 集成(`agent_builder.py` 不动 ── pool 是孤立模块,Step 3 才插)② shell/run_python 切到容器 ③ egress proxy(Step 4)④ reaper 后台 task(Step 3 接入 web lifespan 时一起加)。**验证**:`from core.sandbox import ...` 全套导入 + ctor 通过;`SandboxPool(user_root_base=Path(...), pg_ips='10.x,172.x')` 字段正确;`unittest discover` 1/1 PASS。docker 真起容器验证在 Ubuntu 上跑(RUN.md "Sandbox(Stage C,Ubuntu)" 段写了 5 条 smoke 命令:build / iptables 段 / non-root uid / read-only / 销毁)。`DESIGN.md` 不动(纯按 §7.5 #1 #3 既有协议实施);`RUN.md` 加 "Sandbox(Stage C,Ubuntu)" 部署段(镜像构建 / sandbox env / 5 条验证命令 / xfs project quota 升级时点)+ 故障兜底加 2 条(uid 错配 EACCES / NET_ADMIN 缺失)。否决:(a) 容器名用 sha256(uid)[:12] + label 反查 —— 每次 exec 多一次 `docker ps --filter` round-trip,可读性损失,隐私收益 0;(b) per-task 容器 —— DESIGN §7.5 已锁 per-user 共享心智模型(同 user 多 task 共享素材),不重开;(c) 用 docker `init container` 范式做 iptables —— Docker 没原生支持(那是 k8s),compose v2 同步又增复杂度,NET_ADMIN + 非 root exec 范式更直接;(d) Step 2 立即接入 AgentLoop —— 接了不能 dogfood(本地 Windows 无 docker),反而污染 host 路径;pool 孤立 commit 留 Step 3 一起接。
|
||||||
- **Stage C Step 1:Executor 接口骨架 + HostExecutor in-process backend(§7.5 #5 落地)**:`core/executor.py` 加 `Executor` ABC + `ExecCtx`(user_id/task_id/working_dir/cancel_check)+ `ToolResult`(content/exit_code);`core/executor_host.py` 加 `HostExecutor` 包原 tools dict,`call_tool` 内部分流到对应 `Tool.execute` 并把三种错误(unknown / TypeError / 抛异常)统一收成 `[Error] ...` content + exit_code 区分。`AgentLoop.__init__` 改接 `executor` 而非 `tools` dict、加 `working_dir` 形参;`_stream_llm` 用 `executor.schemas()` 拼 LLM tools 字段;`_execute_tool_call` 改单条 `executor.call_tool(name, args, ctx)`,删原三段错误 emit(unknown/TypeError/Exception 已被 executor 收编为 ToolResult,只剩一处 emit)。`agent_builder.py` 装完 tools dict 后 `HostExecutor(tools)` 包一层,传给 `AgentLoop`。**接口形状刻意 backend 无关**——不暴露 `docker exec` / `docker cp` 等 Docker 假设,Step 3 切 docker backend 时 `AgentLoop` 零改动,只换 `agent_builder.py` 里 `HostExecutor` → `DockerExecutor(host_tools=..., docker_tools={shell, run_python})`。**行为零变化** —— sanity import 通过,`unittest discover -s tests` 1/1 PASS。`DESIGN.md` 不动(纯按 §7.5 #5 既有协议实施,无架构漂移);`RUN.md` 不动(无新 env / CLI 变化,`ZCBOT_SANDBOX_BACKEND` env 留到 Step 3 docker backend 引入时一起加)。否决:(a) 不抽 Executor 直接在 `shell.py/run_python.py` 里 `if backend=='docker'` —— 违反 §7.5 #5,未来切 gVisor/Firecracker 时改动散到工具层;(b) Executor 用 `exec(cmd, ctx)` primitive 而非 `call_tool(name, args, ctx)` dispatcher —— 不匹配 DESIGN 签名,且 host 工具(read/web_*/seedream)不是 "命令" 语义;(c) 用 `cancel_check` callable 替代 ExecCtx 重建 —— 当前 cancel_check 是 build 后 setter 赋值,ctx 缓存会指向 stale,per-call 构 ExecCtx 是 dataclass 廉价。
|
- **Stage C Step 1:Executor 接口骨架 + HostExecutor in-process backend(§7.5 #5 落地)**:`core/executor.py` 加 `Executor` ABC + `ExecCtx`(user_id/task_id/working_dir/cancel_check)+ `ToolResult`(content/exit_code);`core/executor_host.py` 加 `HostExecutor` 包原 tools dict,`call_tool` 内部分流到对应 `Tool.execute` 并把三种错误(unknown / TypeError / 抛异常)统一收成 `[Error] ...` content + exit_code 区分。`AgentLoop.__init__` 改接 `executor` 而非 `tools` dict、加 `working_dir` 形参;`_stream_llm` 用 `executor.schemas()` 拼 LLM tools 字段;`_execute_tool_call` 改单条 `executor.call_tool(name, args, ctx)`,删原三段错误 emit(unknown/TypeError/Exception 已被 executor 收编为 ToolResult,只剩一处 emit)。`agent_builder.py` 装完 tools dict 后 `HostExecutor(tools)` 包一层,传给 `AgentLoop`。**接口形状刻意 backend 无关**——不暴露 `docker exec` / `docker cp` 等 Docker 假设,Step 3 切 docker backend 时 `AgentLoop` 零改动,只换 `agent_builder.py` 里 `HostExecutor` → `DockerExecutor(host_tools=..., docker_tools={shell, run_python})`。**行为零变化** —— sanity import 通过,`unittest discover -s tests` 1/1 PASS。`DESIGN.md` 不动(纯按 §7.5 #5 既有协议实施,无架构漂移);`RUN.md` 不动(无新 env / CLI 变化,`ZCBOT_SANDBOX_BACKEND` env 留到 Step 3 docker backend 引入时一起加)。否决:(a) 不抽 Executor 直接在 `shell.py/run_python.py` 里 `if backend=='docker'` —— 违反 §7.5 #5,未来切 gVisor/Firecracker 时改动散到工具层;(b) Executor 用 `exec(cmd, ctx)` primitive 而非 `call_tool(name, args, ctx)` dispatcher —— 不匹配 DESIGN 签名,且 host 工具(read/web_*/seedream)不是 "命令" 语义;(c) 用 `cancel_check` callable 替代 ExecCtx 重建 —— 当前 cancel_check 是 build 后 setter 赋值,ctx 缓存会指向 stale,per-call 构 ExecCtx 是 dataclass 廉价。
|
||||||
- **REVISIONS.md 修订日志机制(覆盖 proposal/patent/ppt 三个产物型 skill)**:`<task_dir>/REVISIONS.md` 作为产物迭代过程的紧凑 changelog —— task 对话历史是粗流水(50 条消息找上周改动靠翻),REVISIONS 是用户与 LLM 共同沉淀的实质决策列表(5 行就能复盘"上周这章为啥这么写"),与 spec 定位互补:**spec = 宪法(定调一次),REVISIONS = 实施日志(每次卡点累加)**。三个 SKILL.md 各加 (a) 起草步骤里加一步"用户确认实质改动后追加一行" + (b) "## 修订日志" 独立小节(何时记/何时不记表 + 格式约定 + 实例 + 操作)。三类 skill 的"实质改动"判据按各自领域定制:proposal = 技术路线/考核指标/创新点/课题分解/关键引文/预算结构;patent = 区别技术特征/关键参数/公式/实施例/章节;ppt = 版式/主色/页/图标/文案要点。统一原则:首次起草不记 / 错别字微调不记 / 模型自己改改撤撤不记 — 拿不准倾向不记,避免变流水账。格式选**单行 bullet 倒序追加**(时间在前、文件:章节定位、改了什么 — 为什么),用 edit 在头注释后插入新一行(不 append 到末尾,倒序读秒看最新)。否决:(a) 走 system prompt 软约束 — 对 coding/research/documents/imagegen/videogen 等非产物型 skill 强加无关约束;(b) 新建 `record_revision` tool — 开发期内 LLM 直接 edit 追加足够,加 tool 增加每次小改的调用开销,后期发现 LLM 漏记多再升 tool 化;(c) 按产物拆多文件(`<topic>.revisions.md`)— 单文件好读、跨产物时间线统一。`DESIGN.md` 不动(无架构变化);`RUN.md` 不动(无 CLI/env 变化)。
|
- **REVISIONS.md 修订日志机制(覆盖 proposal/patent/ppt 三个产物型 skill)**:`<task_dir>/REVISIONS.md` 作为产物迭代过程的紧凑 changelog —— task 对话历史是粗流水(50 条消息找上周改动靠翻),REVISIONS 是用户与 LLM 共同沉淀的实质决策列表(5 行就能复盘"上周这章为啥这么写"),与 spec 定位互补:**spec = 宪法(定调一次),REVISIONS = 实施日志(每次卡点累加)**。三个 SKILL.md 各加 (a) 起草步骤里加一步"用户确认实质改动后追加一行" + (b) "## 修订日志" 独立小节(何时记/何时不记表 + 格式约定 + 实例 + 操作)。三类 skill 的"实质改动"判据按各自领域定制:proposal = 技术路线/考核指标/创新点/课题分解/关键引文/预算结构;patent = 区别技术特征/关键参数/公式/实施例/章节;ppt = 版式/主色/页/图标/文案要点。统一原则:首次起草不记 / 错别字微调不记 / 模型自己改改撤撤不记 — 拿不准倾向不记,避免变流水账。格式选**单行 bullet 倒序追加**(时间在前、文件:章节定位、改了什么 — 为什么),用 edit 在头注释后插入新一行(不 append 到末尾,倒序读秒看最新)。否决:(a) 走 system prompt 软约束 — 对 coding/research/documents/imagegen/videogen 等非产物型 skill 强加无关约束;(b) 新建 `record_revision` tool — 开发期内 LLM 直接 edit 追加足够,加 tool 增加每次小改的调用开销,后期发现 LLM 漏记多再升 tool 化;(c) 按产物拆多文件(`<topic>.revisions.md`)— 单文件好读、跨产物时间线统一。`DESIGN.md` 不动(无架构变化);`RUN.md` 不动(无 CLI/env 变化)。
|
||||||
|
|
|
||||||
35
RUN.md
35
RUN.md
|
|
@ -256,8 +256,14 @@ sudo journalctl -u zcbot -n 50 # 看新进程起没起干
|
||||||
## Sandbox(Stage C,Ubuntu)
|
## Sandbox(Stage C,Ubuntu)
|
||||||
|
|
||||||
> 为外部用户开放前必须完成。当前 dogfood + 信任同事白名单阶段可跳过 ── 默 backend = host,
|
> 为外部用户开放前必须完成。当前 dogfood + 信任同事白名单阶段可跳过 ── 默 backend = host,
|
||||||
> `shell` / `run_python` 仍走 subprocess(未隔离)。Step 3 接入 DockerExecutor 后切
|
> `shell` / `run_python` 仍走 subprocess(未隔离)。Step 3 已接入 DockerExecutor:
|
||||||
> `ZCBOT_SANDBOX_BACKEND=docker` 启用。
|
> `ZCBOT_SANDBOX_BACKEND=docker` 切容器执行;`host`(默)保留本地 Windows / 同事 dogfood。
|
||||||
|
>
|
||||||
|
> 启用 docker backend 的前置条件:
|
||||||
|
> 1. 部署机有 docker daemon,zcbot 用户在 `docker` group
|
||||||
|
> 2. `zcbot-sandbox:latest` 镜像已 build(`HOST_UID/GID` 对齐)
|
||||||
|
> 3. `.env` 至少有 `ZCBOT_PG_IPS=<PG实际IP>`(§7.5 #1 PG 单独 block 一遍)
|
||||||
|
> 4. lifespan 启动失败会 fail-fast(`RuntimeError: sandbox init failed`),不静默退到 host
|
||||||
|
|
||||||
### 镜像构建
|
### 镜像构建
|
||||||
|
|
||||||
|
|
@ -284,6 +290,14 @@ sudo -u zcbot docker network create --internal zcbot-sandbox-net
|
||||||
### Sandbox 相关 env(.env 加)
|
### Sandbox 相关 env(.env 加)
|
||||||
|
|
||||||
```
|
```
|
||||||
|
# Backend 选择(默 host):
|
||||||
|
# host = shell/run_python 走 host subprocess(本地 Windows / dogfood)
|
||||||
|
# docker = shell/run_python 走 per-user 容器 docker exec(部署机 / 外部用户)
|
||||||
|
# ZCBOT_SANDBOX_BACKEND=docker
|
||||||
|
|
||||||
|
# 容器内 exec 用户(默 1000:1000;Dockerfile 的 HOST_UID/HOST_GID build-arg 同步对齐)
|
||||||
|
# ZCBOT_SANDBOX_EXEC_USER=1000:1000
|
||||||
|
|
||||||
# 容器镜像 tag(默 zcbot-sandbox:latest)
|
# 容器镜像 tag(默 zcbot-sandbox:latest)
|
||||||
# ZCBOT_SANDBOX_IMAGE=zcbot-sandbox:latest
|
# ZCBOT_SANDBOX_IMAGE=zcbot-sandbox:latest
|
||||||
# 容器 runtime(切 gVisor 用 runsc,Firecracker 用 kata;默 runc)
|
# 容器 runtime(切 gVisor 用 runsc,Firecracker 用 kata;默 runc)
|
||||||
|
|
@ -295,10 +309,23 @@ sudo -u zcbot docker network create --internal zcbot-sandbox-net
|
||||||
ZCBOT_PG_IPS=10.1.2.3,10.1.2.4
|
ZCBOT_PG_IPS=10.1.2.3,10.1.2.4
|
||||||
```
|
```
|
||||||
|
|
||||||
### 验证(Step 2 部分能验)
|
### 验证
|
||||||
|
|
||||||
|
Step 3 之后,推荐用集成验证(web 起 docker backend + dev SPA 发 `shell` / `run_python` 消息):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 启动 web 时切 docker backend(.env 已设 PG_IPS / SANDBOX_BACKEND=docker)
|
||||||
|
ZCBOT_SANDBOX_BACKEND=docker .venv/bin/python main.py web
|
||||||
|
|
||||||
|
# 触发任一 shell / run_python 消息后,容器应已起
|
||||||
|
sudo -u zcbot docker ps --filter label=zcbot.product=sandbox
|
||||||
|
# 应看到 zcbot-sandbox-<your-uid>,STATUS = Up ...
|
||||||
|
# 5 分钟无新消息后 reaper 自动 rm
|
||||||
|
```
|
||||||
|
|
||||||
|
也可直接起一个测试容器单验 hardening(不依赖 web 进程):
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 起一个测试容器(直接 docker run,不走 pool ── pool 在 Step 3 接入后才用)
|
|
||||||
USER_ID=00000000-0000-0000-0000-000000000001
|
USER_ID=00000000-0000-0000-0000-000000000001
|
||||||
sudo -u zcbot docker run -d \
|
sudo -u zcbot docker run -d \
|
||||||
--name zcbot-sandbox-$USER_ID \
|
--name zcbot-sandbox-$USER_ID \
|
||||||
|
|
|
||||||
|
|
@ -26,6 +26,7 @@ import yaml
|
||||||
from rich.console import Console
|
from rich.console import Console
|
||||||
|
|
||||||
from core.capabilities import ModelCapabilities
|
from core.capabilities import ModelCapabilities
|
||||||
|
from core.executor_docker import DockerExecutor
|
||||||
from core.executor_host import HostExecutor
|
from core.executor_host import HostExecutor
|
||||||
from core.llm import LLM
|
from core.llm import LLM
|
||||||
from core.loop import AgentLoop
|
from core.loop import AgentLoop
|
||||||
|
|
@ -53,6 +54,39 @@ def load_config() -> dict:
|
||||||
return yaml.safe_load((ROOT / "config" / "agent.yaml").read_text(encoding="utf-8")) or {}
|
return yaml.safe_load((ROOT / "config" / "agent.yaml").read_text(encoding="utf-8")) or {}
|
||||||
|
|
||||||
|
|
||||||
|
def _resolve_executor(
|
||||||
|
host: HostExecutor,
|
||||||
|
user_id: UUID,
|
||||||
|
user_root_path: Path,
|
||||||
|
working_dir_path: Path,
|
||||||
|
):
|
||||||
|
"""选 Executor backend(§7.5 #5)。
|
||||||
|
|
||||||
|
env `ZCBOT_SANDBOX_BACKEND=docker` 时构造 DockerExecutor;其他值 / 缺失 → host。
|
||||||
|
docker 路径要 lifespan 已 `core.sandbox.init_pool` 过(否则 pool 为 None → 退 host
|
||||||
|
+ 启动日志由 web 入口在 init 时打印,这里不重复 warn)。
|
||||||
|
"""
|
||||||
|
import os
|
||||||
|
if os.getenv("ZCBOT_SANDBOX_BACKEND", "host").lower() != "docker":
|
||||||
|
return host
|
||||||
|
from core.sandbox import get_pool
|
||||||
|
pool = get_pool()
|
||||||
|
if pool is None:
|
||||||
|
# lifespan 没 init 成功 —— 让上层早死比静默退化更安全(避免外部用户开放时
|
||||||
|
# 误以为在沙盒里跑实则 host)。Web 入口启动会 fail-fast,这里再补一条提醒。
|
||||||
|
raise RuntimeError(
|
||||||
|
"ZCBOT_SANDBOX_BACKEND=docker but sandbox pool not initialized; "
|
||||||
|
"check web lifespan init_pool() / docker daemon availability"
|
||||||
|
)
|
||||||
|
return DockerExecutor(
|
||||||
|
host=host,
|
||||||
|
pool=pool,
|
||||||
|
user_id=user_id,
|
||||||
|
user_root=user_root_path,
|
||||||
|
working_dir=working_dir_path,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def resolve_workspace(workspace: Optional[str], cfg: Optional[dict] = None) -> Path:
|
def resolve_workspace(workspace: Optional[str], cfg: Optional[dict] = None) -> Path:
|
||||||
cfg = cfg or load_config()
|
cfg = cfg or load_config()
|
||||||
p = Path(workspace) if workspace else ROOT / cfg.get("workspace_dir", "workspace")
|
p = Path(workspace) if workspace else ROOT / cfg.get("workspace_dir", "workspace")
|
||||||
|
|
@ -439,9 +473,11 @@ def build_agent(
|
||||||
tools[ws.name] = ws
|
tools[ws.name] = ws
|
||||||
|
|
||||||
sink = ConsoleEventSink(console) if console else None
|
sink = ConsoleEventSink(console) if console else None
|
||||||
# §7.5 #5 Executor 抽象:本步全 host backend(in-process),Step 3 docker backend
|
# §7.5 #5/#6 Executor 抽象:env `ZCBOT_SANDBOX_BACKEND=host|docker` 切 backend。
|
||||||
# 引入后切 `ZCBOT_SANDBOX_BACKEND=docker` 把 shell/run_python dispatch 到容器。
|
# host(默)= 全 in-process,本地 dogfood / Windows 走这条;docker = shell/run_python
|
||||||
executor = HostExecutor(tools)
|
# dispatch 到 per-user 容器(其他工具仍 host)。docker 路径要求 lifespan 已 `init_pool`。
|
||||||
|
host_executor = HostExecutor(tools)
|
||||||
|
executor = _resolve_executor(host_executor, uid, ur_path, working_dir_path)
|
||||||
agent = AgentLoop(
|
agent = AgentLoop(
|
||||||
llm, executor, session, caps,
|
llm, executor, session, caps,
|
||||||
user_id=uid, working_dir=working_dir_path, sink=sink,
|
user_id=uid, working_dir=working_dir_path, sink=sink,
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,239 @@
|
||||||
|
"""DockerExecutor:`shell` / `run_python` 走 docker exec,其余 in-process(§7.5 #6)。
|
||||||
|
|
||||||
|
Backend 二分(§7.5 #6 信任域):
|
||||||
|
- host in-process:`read/write/edit/glob/grep/load_skill/web_*/seedream/seedance`
|
||||||
|
原本就在 host 持凭据(Bocha key / ARK key)或走 `paths.py::resolve_user_path` 校验
|
||||||
|
(user-rooted 安全边界已存),塞容器无收益付 ~200ms exec overhead × N 次
|
||||||
|
- container exec:`shell` / `run_python` —— 执行模型生成的任意代码,必须容器隔离
|
||||||
|
|
||||||
|
容器准入(per call):
|
||||||
|
1. `pool.ensure(user_id)` —— 拿到 / 起 `zcbot-sandbox-<uid>` 容器(per-user lock 已串行化)
|
||||||
|
2. `docker exec --user 1000:1000 --workdir /workspace/<wd_name> <c> setsid bash -c '<cmd>'`
|
||||||
|
3. timeout 到 → 杀 docker CLI 客户端(Popen.kill())
|
||||||
|
4. 完成 → `pool.mark_active(user_id)` 刷 idle 计时
|
||||||
|
|
||||||
|
run_python tmp .py 落 host 侧 `<user_root>/.zcbot_tmp/<task_id>/<rand>.py`(bind mount
|
||||||
|
自动可见于容器 `/workspace/.zcbot_tmp/<task_id>/`),执行完 unlink。dotfile 起头让
|
||||||
|
`/v1/files` API 天然过滤(`web/app.py:169` startswith(".")),用户视野不污染。
|
||||||
|
|
||||||
|
Cancel limitation(第一版接受):
|
||||||
|
- docker exec 客户端断开后,容器内 server 端进程**不会**因此终止 —— 这是 docker 设计
|
||||||
|
- 第一版只杀 docker CLI(Popen.kill());容器内残留进程靠 idle 5min reaper / 下次
|
||||||
|
ensure 时 rm -f 兜底
|
||||||
|
- 升级触发(§7.5 #3 PGID 协议):用户反馈"取消了但还在烧 CPU" / 多次 cancel 后
|
||||||
|
容器内进程堆积 → 启用「ZCBOT_EXEC_ID env + PGID 写文件 + 二次 exec kill」协议
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
import secrets
|
||||||
|
import subprocess
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any, Dict, List, Optional
|
||||||
|
from uuid import UUID
|
||||||
|
|
||||||
|
from .executor import ExecCtx, Executor, ToolResult
|
||||||
|
from .executor_host import HostExecutor
|
||||||
|
from .sandbox import SandboxPool
|
||||||
|
|
||||||
|
|
||||||
|
CONTAINER_TOOLS = frozenset({"shell", "run_python"})
|
||||||
|
|
||||||
|
# 容器内非 root 用户:与 Dockerfile HOST_UID/HOST_GID build-arg 默认值对齐。
|
||||||
|
# 部署机 host 上 zcbot 账号 uid 若非 1000,镜像 build 时透传 HOST_UID + 这里
|
||||||
|
# env `ZCBOT_SANDBOX_EXEC_USER` 同步改(详 RUN.md "Sandbox 部署"段)。
|
||||||
|
DEFAULT_EXEC_USER = "1000:1000"
|
||||||
|
|
||||||
|
# host 侧 tmp 脚本目录(user_root 内 dotfile,被 /v1/files API 隐藏)
|
||||||
|
TMP_SUBDIR = ".zcbot_tmp"
|
||||||
|
|
||||||
|
|
||||||
|
class DockerExecutor(Executor):
|
||||||
|
"""组合 HostExecutor + docker exec dispatch shell/run_python。
|
||||||
|
|
||||||
|
host backend 仍承担 schema 列表 + 大部分 tool 执行;本类只在 shell/run_python
|
||||||
|
命中时夺路接管,docker exec 在 per-user 容器里跑。
|
||||||
|
"""
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
host: HostExecutor,
|
||||||
|
pool: SandboxPool,
|
||||||
|
user_id: UUID,
|
||||||
|
user_root: Path,
|
||||||
|
working_dir: Path,
|
||||||
|
) -> None:
|
||||||
|
self.host = host
|
||||||
|
self.pool = pool
|
||||||
|
self.user_id = user_id
|
||||||
|
self.user_root = user_root.resolve()
|
||||||
|
self.working_dir = working_dir.resolve()
|
||||||
|
# 容器内对应路径 /workspace/<wd_name>
|
||||||
|
try:
|
||||||
|
wd_rel = self.working_dir.relative_to(self.user_root)
|
||||||
|
self.container_workdir = "/workspace/" + wd_rel.as_posix()
|
||||||
|
except ValueError:
|
||||||
|
# working_dir 不在 user_root 下 —— 防御性兜底,正常路径不会到这里
|
||||||
|
self.container_workdir = "/workspace"
|
||||||
|
self.exec_user = os.getenv("ZCBOT_SANDBOX_EXEC_USER", DEFAULT_EXEC_USER)
|
||||||
|
|
||||||
|
# ── Executor 接口 ────────────────────────────────────────
|
||||||
|
|
||||||
|
def has_tool(self, name: str) -> bool:
|
||||||
|
return self.host.has_tool(name)
|
||||||
|
|
||||||
|
def schemas(self) -> List[Dict[str, Any]]:
|
||||||
|
return self.host.schemas()
|
||||||
|
|
||||||
|
def call_tool(self, name: str, args: Dict[str, Any], ctx: ExecCtx) -> ToolResult:
|
||||||
|
if name not in CONTAINER_TOOLS:
|
||||||
|
return self.host.call_tool(name, args, ctx)
|
||||||
|
if not self.host.has_tool(name):
|
||||||
|
# caps.enable_run_python=False 等场景下,host 没装 run_python → schema 也没暴露
|
||||||
|
return ToolResult(content=f"[Error] unknown tool: {name}", exit_code=2)
|
||||||
|
try:
|
||||||
|
if name == "shell":
|
||||||
|
return self._exec_shell(args, ctx)
|
||||||
|
if name == "run_python":
|
||||||
|
return self._exec_python(args, ctx)
|
||||||
|
except Exception as e:
|
||||||
|
return ToolResult(
|
||||||
|
content=f"[Error executing {name} via docker] {type(e).__name__}: {e}",
|
||||||
|
exit_code=1,
|
||||||
|
)
|
||||||
|
return ToolResult(content=f"[Error] unhandled container tool: {name}", exit_code=2)
|
||||||
|
|
||||||
|
# ── shell ────────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _exec_shell(self, args: Dict[str, Any], ctx: ExecCtx) -> ToolResult:
|
||||||
|
cmd = args.get("command")
|
||||||
|
if not isinstance(cmd, str) or not cmd.strip():
|
||||||
|
return ToolResult(
|
||||||
|
content="[Error] bad arguments to shell: command must be non-empty string",
|
||||||
|
exit_code=2,
|
||||||
|
)
|
||||||
|
timeout = int(args.get("timeout") or 60)
|
||||||
|
|
||||||
|
container = self.pool.ensure(self.user_id)
|
||||||
|
argv = self._docker_exec_argv(container) + ["setsid", "bash", "-c", cmd]
|
||||||
|
result = self._run_subprocess(argv, timeout=timeout, ctx=ctx)
|
||||||
|
self.pool.mark_active(self.user_id)
|
||||||
|
return result
|
||||||
|
|
||||||
|
# ── run_python ───────────────────────────────────────────
|
||||||
|
|
||||||
|
def _exec_python(self, args: Dict[str, Any], ctx: ExecCtx) -> ToolResult:
|
||||||
|
code = args.get("code")
|
||||||
|
if not isinstance(code, str):
|
||||||
|
return ToolResult(
|
||||||
|
content="[Error] bad arguments to run_python: code must be string",
|
||||||
|
exit_code=2,
|
||||||
|
)
|
||||||
|
timeout = int(args.get("timeout") or 120)
|
||||||
|
|
||||||
|
# tmp .py 落 host 侧 `.zcbot_tmp/<task_id>/<rand>.py`;
|
||||||
|
# 容器内对应 /workspace/.zcbot_tmp/<task_id>/<rand>.py
|
||||||
|
tmp_root = self.user_root / TMP_SUBDIR / str(ctx.task_id)
|
||||||
|
tmp_root.mkdir(parents=True, exist_ok=True)
|
||||||
|
rand_name = f"{int(time.time() * 1000)}-{secrets.token_hex(4)}.py"
|
||||||
|
host_script = tmp_root / rand_name
|
||||||
|
container_script = f"/workspace/{TMP_SUBDIR}/{ctx.task_id}/{rand_name}"
|
||||||
|
host_script.write_text(code, encoding="utf-8")
|
||||||
|
|
||||||
|
try:
|
||||||
|
container = self.pool.ensure(self.user_id)
|
||||||
|
argv = self._docker_exec_argv(
|
||||||
|
container,
|
||||||
|
extra_env={
|
||||||
|
"PYTHONIOENCODING": "utf-8",
|
||||||
|
"PYTHONPATH": "/workspace",
|
||||||
|
},
|
||||||
|
) + ["setsid", "python", container_script]
|
||||||
|
result = self._run_subprocess(argv, timeout=timeout, ctx=ctx)
|
||||||
|
self.pool.mark_active(self.user_id)
|
||||||
|
return result
|
||||||
|
finally:
|
||||||
|
try:
|
||||||
|
host_script.unlink()
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
# ── helpers ──────────────────────────────────────────────
|
||||||
|
|
||||||
|
def _docker_exec_argv(
|
||||||
|
self, container: str, extra_env: Optional[Dict[str, str]] = None
|
||||||
|
) -> List[str]:
|
||||||
|
argv = [
|
||||||
|
"docker", "exec",
|
||||||
|
"--user", self.exec_user,
|
||||||
|
"--workdir", self.container_workdir,
|
||||||
|
]
|
||||||
|
env: Dict[str, str] = {}
|
||||||
|
if extra_env:
|
||||||
|
env.update(extra_env)
|
||||||
|
for k, v in env.items():
|
||||||
|
argv.extend(["-e", f"{k}={v}"])
|
||||||
|
argv.append(container)
|
||||||
|
return argv
|
||||||
|
|
||||||
|
def _run_subprocess(
|
||||||
|
self, argv: List[str], timeout: int, ctx: ExecCtx
|
||||||
|
) -> ToolResult:
|
||||||
|
"""跑 docker exec 子进程,带 cancel 协作 poll。
|
||||||
|
|
||||||
|
cancel 命中 / timeout 到 → Popen.kill() 杀 docker CLI 客户端;
|
||||||
|
容器内 server 端进程接受 limitation(见模块头注释)。
|
||||||
|
"""
|
||||||
|
cancel_check = ctx.cancel_check
|
||||||
|
try:
|
||||||
|
proc = subprocess.Popen(
|
||||||
|
argv,
|
||||||
|
stdout=subprocess.PIPE,
|
||||||
|
stderr=subprocess.PIPE,
|
||||||
|
text=True,
|
||||||
|
encoding="utf-8",
|
||||||
|
errors="replace",
|
||||||
|
)
|
||||||
|
except FileNotFoundError as e:
|
||||||
|
return ToolResult(content=f"[Error] docker CLI not found: {e}", exit_code=2)
|
||||||
|
|
||||||
|
start = time.monotonic()
|
||||||
|
cancel_hit = False
|
||||||
|
timeout_hit = False
|
||||||
|
stdout: str = ""
|
||||||
|
stderr: str = ""
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
stdout, stderr = proc.communicate(timeout=0.5)
|
||||||
|
break
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
if cancel_check is not None and cancel_check():
|
||||||
|
cancel_hit = True
|
||||||
|
proc.kill()
|
||||||
|
stdout, stderr = proc.communicate()
|
||||||
|
break
|
||||||
|
if time.monotonic() - start > timeout:
|
||||||
|
timeout_hit = True
|
||||||
|
proc.kill()
|
||||||
|
stdout, stderr = proc.communicate()
|
||||||
|
break
|
||||||
|
|
||||||
|
if timeout_hit:
|
||||||
|
return ToolResult(
|
||||||
|
content=f"[Error] command timed out after {timeout}s",
|
||||||
|
exit_code=124,
|
||||||
|
)
|
||||||
|
if cancel_hit:
|
||||||
|
return ToolResult(
|
||||||
|
content="[Error] command cancelled by user",
|
||||||
|
exit_code=130,
|
||||||
|
)
|
||||||
|
|
||||||
|
parts: List[str] = []
|
||||||
|
if stdout:
|
||||||
|
parts.append(f"[stdout]\n{stdout.rstrip()}")
|
||||||
|
if stderr:
|
||||||
|
parts.append(f"[stderr]\n{stderr.rstrip()}")
|
||||||
|
parts.append(f"[exit {proc.returncode}]")
|
||||||
|
return ToolResult(content="\n".join(parts), exit_code=proc.returncode)
|
||||||
|
|
@ -3,17 +3,48 @@
|
||||||
模块边界:
|
模块边界:
|
||||||
- `network.py`:Docker network ensure(`zcbot-sandbox-net`,`--internal` 隔离 outbound + cross-container)
|
- `network.py`:Docker network ensure(`zcbot-sandbox-net`,`--internal` 隔离 outbound + cross-container)
|
||||||
- `pool.py`:per-user 容器生命周期(ensure / mark_active / reap_idle / shutdown_all)
|
- `pool.py`:per-user 容器生命周期(ensure / mark_active / reap_idle / shutdown_all)
|
||||||
|
- `__init__.py`:module-level singleton(`init_pool` / `get_pool`),给 web lifespan 与
|
||||||
|
`agent_builder` 共享同一个池实例。
|
||||||
|
|
||||||
不在本目录:`shell` / `run_python` 工具的 docker exec 调用 ── 那是 Step 3 的
|
不在本目录:`shell` / `run_python` 工具的 docker exec 调用 ── 那是 `core/executor_docker.py`,
|
||||||
`core/executor_docker.py`,调用本模块的 `pool.ensure(user_id)` 拿到容器名后再 exec。
|
调用本模块的 `pool.ensure(user_id)` 拿到容器名后再 exec。
|
||||||
"""
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
from .pool import SandboxPool, container_name, setup_pool
|
from .pool import SandboxPool, container_name, setup_pool
|
||||||
from .network import NETWORK_NAME, ensure_network
|
from .network import NETWORK_NAME, ensure_network
|
||||||
|
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"SandboxPool",
|
"SandboxPool",
|
||||||
"container_name",
|
"container_name",
|
||||||
"setup_pool",
|
"setup_pool",
|
||||||
"NETWORK_NAME",
|
"NETWORK_NAME",
|
||||||
"ensure_network",
|
"ensure_network",
|
||||||
|
"init_pool",
|
||||||
|
"get_pool",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
|
||||||
|
# Module-level singleton。web lifespan 启动钩子调 `init_pool(user_root_base)`,
|
||||||
|
# `agent_builder` 在构造 DockerExecutor 时 `get_pool()` 拿同一实例。
|
||||||
|
# 未初始化 → `get_pool()` 返 None,agent_builder 此时必须不走 docker 分支。
|
||||||
|
_pool: Optional[SandboxPool] = None
|
||||||
|
|
||||||
|
|
||||||
|
def init_pool(user_root_base: Path) -> SandboxPool:
|
||||||
|
"""幂等初始化 module-level pool。返回 pool 实例。
|
||||||
|
|
||||||
|
lifespan 调一次;ensure_network 内部也幂等。重复调用返回同一实例(不重新建)。
|
||||||
|
"""
|
||||||
|
global _pool
|
||||||
|
if _pool is None:
|
||||||
|
_pool = setup_pool(user_root_base)
|
||||||
|
return _pool
|
||||||
|
|
||||||
|
|
||||||
|
def get_pool() -> Optional[SandboxPool]:
|
||||||
|
return _pool
|
||||||
|
|
|
||||||
|
|
@ -5,27 +5,30 @@
|
||||||
出 workspace 目录)。
|
出 workspace 目录)。
|
||||||
|
|
||||||
生命周期:
|
生命周期:
|
||||||
- `ensure(user_id)`:per-user `asyncio.Lock` 串行化 → `docker inspect` 探测 → 已 running
|
- `ensure(user_id)`:per-user `threading.Lock` 串行化 → `docker inspect` 探测 →
|
||||||
直接返;exists-but-stopped 先 `rm -f` 重起(保证 iptables 重新 apply);不存在 `docker run`
|
已 running 直接返;exists-but-stopped 先 `rm -f` 重起(保证 iptables 重新 apply);
|
||||||
|
不存在 `docker run`
|
||||||
- `mark_active(user_id)`:exec 完更新 in-memory `_last_active[uid]=now`(docker labels
|
- `mark_active(user_id)`:exec 完更新 in-memory `_last_active[uid]=now`(docker labels
|
||||||
不可运行时修改 ── Docker 23+ 移除 `docker update --label-add` 支持)
|
不可运行时修改 ── Docker 23+ 移除 `docker update --label-add` 支持)
|
||||||
- `reap_idle()`:周期任务,扫 `_last_active` dict,>`idle_ttl` 的 `docker rm -f`
|
- `reap_idle()`:周期任务,扫 `_last_active` dict,>`idle_ttl` 的 `docker rm -f`
|
||||||
- `shutdown_all()`:app 启动时清前驱孤儿(`docker ps --filter label=zcbot.product=sandbox`)
|
- `shutdown_all()`:app 启动时清前驱孤儿(`docker ps --filter label=zcbot.product=sandbox`)
|
||||||
|
|
||||||
|
API 全同步 —— ensure 主要使用方是 AgentLoop / DockerExecutor,跑在 web BG 线程内
|
||||||
|
天然同步;reaper 跑在 uvicorn 主 loop 里,通过 `run_in_executor` 包一层调本类 sync 方法。
|
||||||
|
threading.Lock 跨线程有效,asyncio.Lock 会被 ephemeral loop 创建 / 销毁绕过保护。
|
||||||
|
|
||||||
幂等性:
|
幂等性:
|
||||||
- ensure 在重复调用时跨 daemon round-trip < 100ms(纯 `docker inspect`);per-user lock
|
- ensure 在重复调用时跨 daemon round-trip < 100ms(纯 `docker inspect`);per-user lock
|
||||||
防同 user 两并发 `docker run --name` 撞 "Conflict"(虽然 docker 本身会 reject,提前
|
防同 user 两并发 `docker run --name` 撞 "Conflict"(虽然 docker 本身会 reject,提前
|
||||||
锁更干净)
|
锁更干净)
|
||||||
- reaper 只杀 dict 里有记录的容器 ── 重启后 dict 空 → 不杀历史孤儿(这条由 startup
|
- reaper 只杀 dict 里有记录的容器 ── 重启后 dict 空 → 不杀历史孤儿(这条由 startup
|
||||||
`shutdown_all` 兜底)
|
`shutdown_all` 兜底)
|
||||||
|
|
||||||
Step 2 范围:仅 pool / lifecycle。Tools(shell / run_python)在 Step 3 接入。
|
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import asyncio
|
|
||||||
import os
|
import os
|
||||||
import subprocess
|
import subprocess
|
||||||
|
import threading
|
||||||
import time
|
import time
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import Dict, List, Optional
|
from typing import Dict, List, Optional
|
||||||
|
|
@ -97,17 +100,19 @@ class SandboxPool:
|
||||||
os.getenv("ZCBOT_SANDBOX_IDLE_TTL", str(DEFAULT_IDLE_TTL_SECONDS))
|
os.getenv("ZCBOT_SANDBOX_IDLE_TTL", str(DEFAULT_IDLE_TTL_SECONDS))
|
||||||
)
|
)
|
||||||
self.pg_ips = pg_ips if pg_ips is not None else os.getenv("ZCBOT_PG_IPS", "")
|
self.pg_ips = pg_ips if pg_ips is not None else os.getenv("ZCBOT_PG_IPS", "")
|
||||||
self._locks: Dict[UUID, asyncio.Lock] = {}
|
self._dict_lock = threading.Lock() # 保护 _locks / _last_active 的字典级 race
|
||||||
|
self._locks: Dict[UUID, threading.Lock] = {}
|
||||||
self._last_active: Dict[UUID, int] = {}
|
self._last_active: Dict[UUID, int] = {}
|
||||||
|
|
||||||
def _lock_for(self, user_id: UUID) -> asyncio.Lock:
|
def _lock_for(self, user_id: UUID) -> threading.Lock:
|
||||||
|
with self._dict_lock:
|
||||||
if user_id not in self._locks:
|
if user_id not in self._locks:
|
||||||
self._locks[user_id] = asyncio.Lock()
|
self._locks[user_id] = threading.Lock()
|
||||||
return self._locks[user_id]
|
return self._locks[user_id]
|
||||||
|
|
||||||
async def ensure(self, user_id: UUID) -> str:
|
def ensure(self, user_id: UUID) -> str:
|
||||||
"""返回容器名;create-or-reuse 原子。"""
|
"""返回容器名;create-or-reuse 原子。同步阻塞,主调方 AgentLoop 已在 BG 线程。"""
|
||||||
async with self._lock_for(user_id):
|
with self._lock_for(user_id):
|
||||||
name = container_name(user_id)
|
name = container_name(user_id)
|
||||||
if _container_running(name):
|
if _container_running(name):
|
||||||
self._last_active[user_id] = _now()
|
self._last_active[user_id] = _now()
|
||||||
|
|
@ -118,7 +123,7 @@ class SandboxPool:
|
||||||
["docker", "rm", "-f", name],
|
["docker", "rm", "-f", name],
|
||||||
capture_output=True, check=False,
|
capture_output=True, check=False,
|
||||||
)
|
)
|
||||||
await asyncio.to_thread(self._docker_run, user_id, name)
|
self._docker_run(user_id, name)
|
||||||
self._last_active[user_id] = _now()
|
self._last_active[user_id] = _now()
|
||||||
return name
|
return name
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,285 @@
|
||||||
|
"""DockerExecutor 单元测试。
|
||||||
|
|
||||||
|
mock subprocess(`docker exec` 命令的实际跑由部署机 smoke 验,RUN.md 有 5 条命令)。
|
||||||
|
覆盖关键路径:
|
||||||
|
- 信任域 dispatch:host 工具直通 / container 工具走 docker exec
|
||||||
|
- argv 形态:--user / --workdir / setsid / bash -c / python <script>
|
||||||
|
- tmp .py:写到 host 侧 `.zcbot_tmp/<task_id>/`,执行完 unlink,无残留
|
||||||
|
- timeout / cancel:Popen.kill() 兜底
|
||||||
|
- schemas() / has_tool() 透传 host
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import sys
|
||||||
|
import tempfile
|
||||||
|
import unittest
|
||||||
|
from pathlib import Path
|
||||||
|
from unittest.mock import MagicMock, patch
|
||||||
|
from uuid import uuid4
|
||||||
|
|
||||||
|
sys.path.insert(0, str(Path(__file__).resolve().parents[1]))
|
||||||
|
|
||||||
|
from core.executor import ExecCtx, ToolResult
|
||||||
|
from core.executor_docker import DockerExecutor, TMP_SUBDIR
|
||||||
|
from core.executor_host import HostExecutor
|
||||||
|
|
||||||
|
|
||||||
|
class FakePool:
|
||||||
|
"""SandboxPool 替身:ensure 返固定容器名,mark_active 记录调用。"""
|
||||||
|
|
||||||
|
def __init__(self):
|
||||||
|
self.ensure_calls = []
|
||||||
|
self.mark_active_calls = []
|
||||||
|
|
||||||
|
def ensure(self, user_id):
|
||||||
|
name = f"zcbot-sandbox-{user_id}"
|
||||||
|
self.ensure_calls.append(user_id)
|
||||||
|
return name
|
||||||
|
|
||||||
|
def mark_active(self, user_id):
|
||||||
|
self.mark_active_calls.append(user_id)
|
||||||
|
|
||||||
|
|
||||||
|
class FakeTool:
|
||||||
|
"""tools.base.Tool 替身:execute 返串,schema 暴露 name + 空 parameters。"""
|
||||||
|
|
||||||
|
def __init__(self, name, output="ok"):
|
||||||
|
self.name = name
|
||||||
|
self._output = output
|
||||||
|
self.execute_calls = []
|
||||||
|
|
||||||
|
@property
|
||||||
|
def schema(self):
|
||||||
|
return {"type": "function", "function": {"name": self.name}}
|
||||||
|
|
||||||
|
def execute(self, **kwargs):
|
||||||
|
self.execute_calls.append(kwargs)
|
||||||
|
return self._output
|
||||||
|
|
||||||
|
|
||||||
|
def make_executor(tools_dict=None):
|
||||||
|
"""构造 DockerExecutor + FakePool + tmp user_root。返回 (executor, pool, tmp_dir)。"""
|
||||||
|
tmp = tempfile.mkdtemp()
|
||||||
|
user_root = Path(tmp) / "users" / "u1"
|
||||||
|
user_root.mkdir(parents=True)
|
||||||
|
working_dir = user_root / "demo"
|
||||||
|
working_dir.mkdir()
|
||||||
|
|
||||||
|
if tools_dict is None:
|
||||||
|
tools_dict = {
|
||||||
|
"read": FakeTool("read", "READ_OUT"),
|
||||||
|
"shell": FakeTool("shell"), # host shell 不应被调用
|
||||||
|
"run_python": FakeTool("run_python"),
|
||||||
|
}
|
||||||
|
host = HostExecutor(tools_dict)
|
||||||
|
pool = FakePool()
|
||||||
|
executor = DockerExecutor(
|
||||||
|
host=host,
|
||||||
|
pool=pool,
|
||||||
|
user_id=uuid4(),
|
||||||
|
user_root=user_root,
|
||||||
|
working_dir=working_dir,
|
||||||
|
)
|
||||||
|
return executor, pool, Path(tmp)
|
||||||
|
|
||||||
|
|
||||||
|
def make_ctx(executor):
|
||||||
|
return ExecCtx(
|
||||||
|
user_id=executor.user_id,
|
||||||
|
task_id=uuid4(),
|
||||||
|
working_dir=executor.working_dir,
|
||||||
|
cancel_check=None,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class TestHostPassthrough(unittest.TestCase):
|
||||||
|
"""非 container tool 直通 host backend,不调 pool / subprocess。"""
|
||||||
|
|
||||||
|
def test_read_passthrough_to_host(self):
|
||||||
|
executor, pool, _ = make_executor()
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
result = executor.call_tool("read", {"file": "x"}, ctx)
|
||||||
|
self.assertEqual(result.content, "READ_OUT")
|
||||||
|
self.assertEqual(result.exit_code, 0)
|
||||||
|
self.assertEqual(pool.ensure_calls, [])
|
||||||
|
self.assertEqual(pool.mark_active_calls, [])
|
||||||
|
|
||||||
|
def test_schemas_and_has_tool_from_host(self):
|
||||||
|
executor, _, _ = make_executor()
|
||||||
|
names = [s["function"]["name"] for s in executor.schemas()]
|
||||||
|
self.assertIn("read", names)
|
||||||
|
self.assertIn("shell", names)
|
||||||
|
self.assertTrue(executor.has_tool("shell"))
|
||||||
|
self.assertFalse(executor.has_tool("nope"))
|
||||||
|
|
||||||
|
|
||||||
|
class TestShellExec(unittest.TestCase):
|
||||||
|
"""shell 调用走 docker exec subprocess,argv 形态正确。"""
|
||||||
|
|
||||||
|
def test_shell_invokes_docker_exec(self):
|
||||||
|
executor, pool, _ = make_executor()
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
|
||||||
|
proc = MagicMock()
|
||||||
|
proc.communicate.return_value = ("hello\n", "")
|
||||||
|
proc.returncode = 0
|
||||||
|
|
||||||
|
with patch("core.executor_docker.subprocess.Popen", return_value=proc) as popen:
|
||||||
|
result = executor.call_tool("shell", {"command": "echo hello"}, ctx)
|
||||||
|
|
||||||
|
self.assertIn("[stdout]\nhello", result.content)
|
||||||
|
self.assertIn("[exit 0]", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 0)
|
||||||
|
|
||||||
|
argv = popen.call_args[0][0]
|
||||||
|
self.assertEqual(argv[:2], ["docker", "exec"])
|
||||||
|
self.assertIn("--user", argv)
|
||||||
|
self.assertIn("--workdir", argv)
|
||||||
|
# workdir 应是 /workspace/demo(working_dir 相对 user_root)
|
||||||
|
self.assertEqual(argv[argv.index("--workdir") + 1], "/workspace/demo")
|
||||||
|
# container name = zcbot-sandbox-<uid>
|
||||||
|
container_idx = argv.index(f"zcbot-sandbox-{executor.user_id}")
|
||||||
|
# setsid bash -c 必须出现且紧跟 container 之后
|
||||||
|
self.assertEqual(argv[container_idx + 1:], ["setsid", "bash", "-c", "echo hello"])
|
||||||
|
|
||||||
|
self.assertEqual(pool.ensure_calls, [executor.user_id])
|
||||||
|
self.assertEqual(pool.mark_active_calls, [executor.user_id])
|
||||||
|
|
||||||
|
def test_shell_bad_args(self):
|
||||||
|
executor, _, _ = make_executor()
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
result = executor.call_tool("shell", {"command": ""}, ctx)
|
||||||
|
self.assertIn("[Error]", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 2)
|
||||||
|
|
||||||
|
def test_shell_timeout(self):
|
||||||
|
executor, pool, _ = make_executor()
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
|
||||||
|
import subprocess as real_subprocess
|
||||||
|
proc = MagicMock()
|
||||||
|
# 第一次 communicate 抛 TimeoutExpired,第二次(kill 后)返空
|
||||||
|
proc.communicate.side_effect = [
|
||||||
|
real_subprocess.TimeoutExpired(cmd="docker", timeout=0.5),
|
||||||
|
("", "killed\n"),
|
||||||
|
]
|
||||||
|
proc.returncode = -9
|
||||||
|
|
||||||
|
with patch("core.executor_docker.subprocess.Popen", return_value=proc), \
|
||||||
|
patch("core.executor_docker.time.monotonic", side_effect=[0, 100]):
|
||||||
|
result = executor.call_tool("shell", {"command": "sleep 9999", "timeout": 1}, ctx)
|
||||||
|
|
||||||
|
self.assertIn("timed out after 1s", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 124)
|
||||||
|
proc.kill.assert_called_once()
|
||||||
|
|
||||||
|
def test_shell_cancel(self):
|
||||||
|
executor, _, _ = make_executor()
|
||||||
|
ctx = ExecCtx(
|
||||||
|
user_id=executor.user_id,
|
||||||
|
task_id=uuid4(),
|
||||||
|
working_dir=executor.working_dir,
|
||||||
|
cancel_check=lambda: True, # 立即 cancel
|
||||||
|
)
|
||||||
|
|
||||||
|
import subprocess as real_subprocess
|
||||||
|
proc = MagicMock()
|
||||||
|
proc.communicate.side_effect = [
|
||||||
|
real_subprocess.TimeoutExpired(cmd="docker", timeout=0.5),
|
||||||
|
("", ""),
|
||||||
|
]
|
||||||
|
proc.returncode = -15
|
||||||
|
|
||||||
|
with patch("core.executor_docker.subprocess.Popen", return_value=proc):
|
||||||
|
result = executor.call_tool("shell", {"command": "sleep 9999"}, ctx)
|
||||||
|
|
||||||
|
self.assertIn("cancelled by user", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 130)
|
||||||
|
proc.kill.assert_called_once()
|
||||||
|
|
||||||
|
|
||||||
|
class TestRunPython(unittest.TestCase):
|
||||||
|
"""run_python:tmp .py 落 user_root/.zcbot_tmp/<task_id>/,跑完 unlink。"""
|
||||||
|
|
||||||
|
def test_run_python_tmp_script(self):
|
||||||
|
executor, pool, tmp_root = make_executor()
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
|
||||||
|
proc = MagicMock()
|
||||||
|
proc.communicate.return_value = ("42\n", "")
|
||||||
|
proc.returncode = 0
|
||||||
|
|
||||||
|
captured_argv = []
|
||||||
|
|
||||||
|
def _popen(argv, **kwargs):
|
||||||
|
captured_argv.append(argv)
|
||||||
|
return proc
|
||||||
|
|
||||||
|
with patch("core.executor_docker.subprocess.Popen", side_effect=_popen):
|
||||||
|
result = executor.call_tool(
|
||||||
|
"run_python", {"code": "print(42)"}, ctx
|
||||||
|
)
|
||||||
|
|
||||||
|
self.assertIn("[stdout]\n42", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 0)
|
||||||
|
|
||||||
|
argv = captured_argv[0]
|
||||||
|
# 末尾形态:setsid python /workspace/.zcbot_tmp/<task_id>/<rand>.py
|
||||||
|
self.assertEqual(argv[-3], "setsid")
|
||||||
|
self.assertEqual(argv[-2], "python")
|
||||||
|
self.assertTrue(argv[-1].startswith(f"/workspace/{TMP_SUBDIR}/{ctx.task_id}/"))
|
||||||
|
self.assertTrue(argv[-1].endswith(".py"))
|
||||||
|
# PYTHONIOENCODING / PYTHONPATH 注入
|
||||||
|
env_kvs = [argv[i + 1] for i, a in enumerate(argv) if a == "-e"]
|
||||||
|
self.assertIn("PYTHONIOENCODING=utf-8", env_kvs)
|
||||||
|
self.assertIn("PYTHONPATH=/workspace", env_kvs)
|
||||||
|
|
||||||
|
# host 侧 tmp 已 unlink(目录可能仍在,无所谓 —— ensure 容器时会重新 mkdir)
|
||||||
|
tmp_subroot = executor.user_root / TMP_SUBDIR / str(ctx.task_id)
|
||||||
|
leftover = list(tmp_subroot.glob("*.py")) if tmp_subroot.exists() else []
|
||||||
|
self.assertEqual(leftover, [], f"tmp .py not cleaned up: {leftover}")
|
||||||
|
|
||||||
|
def test_run_python_bad_code_type(self):
|
||||||
|
executor, _, _ = make_executor()
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
result = executor.call_tool("run_python", {"code": 123}, ctx)
|
||||||
|
self.assertIn("[Error]", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 2)
|
||||||
|
|
||||||
|
def test_run_python_cleans_tmp_on_exception(self):
|
||||||
|
"""Popen 抛异常时 tmp .py 仍要被清理(finally 兜底)。"""
|
||||||
|
executor, _, _ = make_executor()
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
|
||||||
|
with patch(
|
||||||
|
"core.executor_docker.subprocess.Popen",
|
||||||
|
side_effect=RuntimeError("boom"),
|
||||||
|
):
|
||||||
|
result = executor.call_tool("run_python", {"code": "x"}, ctx)
|
||||||
|
|
||||||
|
self.assertIn("[Error executing run_python via docker]", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 1)
|
||||||
|
tmp_subroot = executor.user_root / TMP_SUBDIR / str(ctx.task_id)
|
||||||
|
leftover = list(tmp_subroot.glob("*.py")) if tmp_subroot.exists() else []
|
||||||
|
self.assertEqual(leftover, [])
|
||||||
|
|
||||||
|
|
||||||
|
class TestUnknownTool(unittest.TestCase):
|
||||||
|
def test_unknown_tool_goes_to_host(self):
|
||||||
|
executor, _, _ = make_executor(tools_dict={}) # 空 host → 啥都没
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
result = executor.call_tool("nope", {}, ctx)
|
||||||
|
self.assertIn("unknown tool", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 2)
|
||||||
|
|
||||||
|
def test_container_tool_not_registered_on_host(self):
|
||||||
|
"""caps.enable_run_python=False:host 没装 run_python,docker 也应拒。"""
|
||||||
|
executor, _, _ = make_executor(tools_dict={"read": FakeTool("read")})
|
||||||
|
ctx = make_ctx(executor)
|
||||||
|
result = executor.call_tool("run_python", {"code": "x"}, ctx)
|
||||||
|
self.assertIn("unknown tool", result.content)
|
||||||
|
self.assertEqual(result.exit_code, 2)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
||||||
54
web/app.py
54
web/app.py
|
|
@ -481,7 +481,7 @@ def create_app() -> FastAPI:
|
||||||
async def lifespan(app: FastAPI):
|
async def lifespan(app: FastAPI):
|
||||||
broker.bind_loop(asyncio.get_running_loop())
|
broker.bind_loop(asyncio.get_running_loop())
|
||||||
# Skill 注册表启动时扫一次 — 文件系统静态,运行中不变;/v1/skills 直接读
|
# Skill 注册表启动时扫一次 — 文件系统静态,运行中不变;/v1/skills 直接读
|
||||||
from core.agent_builder import load_config
|
from core.agent_builder import load_config, resolve_workspace
|
||||||
from core.paths import ROOT
|
from core.paths import ROOT
|
||||||
from core.skills import SkillRegistry
|
from core.skills import SkillRegistry
|
||||||
_cfg = load_config()
|
_cfg = load_config()
|
||||||
|
|
@ -500,7 +500,59 @@ def create_app() -> FastAPI:
|
||||||
)
|
)
|
||||||
if result.rowcount:
|
if result.rowcount:
|
||||||
print(f"[startup] reaped {result.rowcount} stale active run(s)")
|
print(f"[startup] reaped {result.rowcount} stale active run(s)")
|
||||||
|
|
||||||
|
# Sandbox pool(§7.5):仅当 ZCBOT_SANDBOX_BACKEND=docker 时启用。
|
||||||
|
# 启动钩子:① init_pool(创建 docker network + pool 实例)② shutdown_all 清
|
||||||
|
# 前驱孤儿(上次进程留下的 zcbot-sandbox-* 容器,内存 _last_active 为空,
|
||||||
|
# 全清重启)③ 后台 reaper task,每 60s 跑 reap_idle。
|
||||||
|
sandbox_backend = os.getenv("ZCBOT_SANDBOX_BACKEND", "host").lower()
|
||||||
|
sandbox_reaper_task = None
|
||||||
|
if sandbox_backend == "docker":
|
||||||
|
from core.sandbox import init_pool
|
||||||
|
workspace = resolve_workspace(None, _cfg)
|
||||||
|
try:
|
||||||
|
pool = init_pool(workspace / "users")
|
||||||
|
removed = pool.shutdown_all()
|
||||||
|
if removed:
|
||||||
|
print(f"[startup] swept {len(removed)} stale sandbox container(s)")
|
||||||
|
|
||||||
|
async def _reaper() -> None:
|
||||||
|
loop = asyncio.get_running_loop()
|
||||||
|
while True:
|
||||||
|
try:
|
||||||
|
await asyncio.sleep(60)
|
||||||
|
removed = await loop.run_in_executor(None, pool.reap_idle)
|
||||||
|
if removed:
|
||||||
|
print(f"[reaper] reaped {len(removed)} idle sandbox container(s)")
|
||||||
|
except asyncio.CancelledError:
|
||||||
|
raise
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[reaper] error: {type(e).__name__}: {e}")
|
||||||
|
|
||||||
|
sandbox_reaper_task = asyncio.create_task(_reaper(), name="sandbox-reaper")
|
||||||
|
app.state.sandbox_pool = pool
|
||||||
|
except Exception as e:
|
||||||
|
# ensure_network / docker CLI 不可用 → fail-fast。Stage C 协议:任一
|
||||||
|
# hardening 缺失视为部署未完成,不退化到 host(否则误以为有沙盒实则在裸跑)。
|
||||||
|
raise RuntimeError(
|
||||||
|
f"ZCBOT_SANDBOX_BACKEND=docker but sandbox init failed: {e}"
|
||||||
|
)
|
||||||
|
try:
|
||||||
yield
|
yield
|
||||||
|
finally:
|
||||||
|
if sandbox_reaper_task is not None:
|
||||||
|
sandbox_reaper_task.cancel()
|
||||||
|
try:
|
||||||
|
await sandbox_reaper_task
|
||||||
|
except (asyncio.CancelledError, Exception):
|
||||||
|
pass
|
||||||
|
if sandbox_backend == "docker":
|
||||||
|
pool = getattr(app.state, "sandbox_pool", None)
|
||||||
|
if pool is not None:
|
||||||
|
try:
|
||||||
|
pool.shutdown_all()
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[shutdown] sandbox shutdown_all error: {type(e).__name__}: {e}")
|
||||||
|
|
||||||
app = FastAPI(
|
app = FastAPI(
|
||||||
title="zcbot api",
|
title="zcbot api",
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue