RUN.md: 加 Stage C 完整部署速查 + 日常运维命令

集中沉淀 Stage C 收尾后的部署 / 运维 / 验证常用命令,避免下次部署 / 排错时
靠记忆翻 PROGRESS / git log 拼:

- 完整部署速查:重 build(全 5 个 build-arg) / 不重 build 切换 / migration /
  startup log 应见的几行
- 日常运维 / 验证:docker ps 看活跃容器 / inspect env mount / DNS+网络验证 /
  iptables 规则 / uid 对齐 / 资源占用 / docker logs / 强制 rm
- 磁盘配额:psql 查 user_disk_usage / 强制扫描 ad-hoc Python
- DNS resolv.conf host 侧位置

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
caoqianming 2026-05-27 12:25:07 +08:00
parent 4ff5b0c171
commit 30b1d40204
1 changed files with 104 additions and 0 deletions

104
RUN.md
View File

@ -387,6 +387,110 @@ sudo -u zcbot docker rm -f zcbot-sandbox-$USER_ID
Step 4 引入 egress proxy 后,完整 5 条红队用例(metadata / loopback / 跨 user / nohup Step 4 引入 egress proxy 后,完整 5 条红队用例(metadata / loopback / 跨 user / nohup
残留 / allowlist 外 403)进 `tests/test_sandbox_redteam.py` 自动化跑。 残留 / allowlist 外 403)进 `tests/test_sandbox_redteam.py` 自动化跑。
### Stage C 完整部署速查
首次部署 / 改了 init.sh / Dockerfile / requirements.txt → 重 build:
```bash
cd /home/lighthouse/zcbot
docker build -t zcbot-sandbox:latest -f deploy/sandbox/Dockerfile \
--build-arg HOST_UID=$(id -u) --build-arg HOST_GID=$(id -g) \
--build-arg APT_MIRROR=http://mirrors.cloud.tencent.com \
--build-arg PIP_INDEX_URL=https://mirrors.cloud.tencent.com/pypi/simple/ \
--build-arg NPM_REGISTRY=https://mirrors.cloud.tencent.com/npm/ \
.
```
只改 host 侧 docker run flag / pool.py(没动 Dockerfile / init.sh)→ 不用重 build,
清旧容器 + 重启 web 即可:
```bash
docker rm -f $(docker ps -aq -f label=zcbot.product=sandbox) 2>/dev/null
systemctl restart zcbot
```
跑 migration(新增 / 修改了 db/migrations/versions/* / models.py):
```bash
.venv/bin/python main.py db upgrade head
```
启用 docker backend 重启 web(确保 `.env``ZCBOT_SANDBOX_BACKEND=docker` 或 systemd
unit 已设):
```bash
systemctl restart zcbot
journalctl -u zcbot -e --no-pager | tail -30 # 看 startup log
# 应见:
# [startup] [warn] fs quota: ext4 on ... (dogfood 阶段 warn 正常)
# [startup] swept N stale sandbox container(s)
# [disk_scanner] initial scan: N user(s)
# INFO: Uvicorn running on 0.0.0.0:8765
```
### 日常运维 / 验证
```bash
# 当前活跃 sandbox 容器
docker ps --filter label=zcbot.product=sandbox \
--format 'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}'
# 某容器跑了什么(看 docker run 完整参数 + env + mount)
SBC=$(docker ps -q -f label=zcbot.product=sandbox | head -1)
docker inspect $SBC --format '{{json .Config.Env}}' | python3 -m json.tool
docker inspect $SBC --format '{{json .HostConfig.Binds}}'
# 容器内 DNS + 网络验证(应见 nameserver 8.8.8.8 / 114.114.114.114)
docker exec $SBC cat /etc/resolv.conf
docker exec $SBC getent hosts www.baidu.com
docker exec $SBC curl -sI -m 5 https://www.baidu.com | head -1
# iptables 规则(应见红线段 DROP + 127.0.0.11:53 ACCEPT 例外)
docker exec $SBC iptables -L OUTPUT -n --line-numbers
# 容器内 zcbot user 与 host 对齐(uid 应一致)
docker exec $SBC id
id # host 上跑
# 容器资源占用
docker stats --no-stream $SBC
# 看本会话 sandbox 启动 stderr(init.sh 的 [init] ... 行)
docker logs $SBC
# 杀某 user 容器(强制下次 ensure 重建,可用于 hotfix 后切镜像)
docker rm -f zcbot-sandbox-<user-uuid>
# 清所有 sandbox 容器(reaper 兜底,不影响 web 进程)
docker rm -f $(docker ps -aq -f label=zcbot.product=sandbox) 2>/dev/null
```
磁盘配额查看(0008 表):
```bash
# 当前所有 user 用量
psql "$ZCBOT_DB_URL" -c "
SELECT user_id, bytes_used/1024/1024 AS mb, file_count, scanned_at
FROM user_disk_usage ORDER BY bytes_used DESC;"
# 强制扫描某 user(改 yaml interval 太久不耐烦时)
.venv/bin/python -c "
from pathlib import Path
from uuid import UUID
from core.storage.disk_quota import scan_user_dir, upsert_user_usage
uid = UUID('<your-user-uuid>')
b, c = scan_user_dir(Path('/home/lighthouse/zcbot/workspace/users') / str(uid))
upsert_user_usage(uid, b, c)
print(f'{b/1024/1024:.1f} MB / {c} files')"
```
DNS / resolv.conf 文件位置(host 侧):
```bash
# 由 SandboxPool 在首次 _docker_run 时生成,每次重建容器会覆盖刷新
cat /home/lighthouse/zcbot/workspace/.sandbox/resolv.conf
```
### 部署前置对账 ### 部署前置对账
`ZCBOT_SANDBOX_BACKEND=docker` 之前跑一次: `ZCBOT_SANDBOX_BACKEND=docker` 之前跑一次: