feat: 接入豆包 Seedance 2.0 Fast 视频生成 (文生视频) + videogen skill

- tools/seedance.py: 异步 submit /contents/generations/tasks → 5s 轮询 → succeeded 后 download mp4 + meta.json 落 <wd>/videos/;失败/cancel 不计费;cancel_check 在轮询间检查,响应用户停止按钮 - config/media/doubao.yaml: 展开 video.seedance_2_fast (¥37/Mtok 文生 / ¥22/Mtok 图生,token 公式校验 720p 5s = ¥4.00 完全对上源数据) - core/storage/usage.py: record_video_usage,kind=video,units jsonb snapshot resolution/duration/ratio/fps/tokens/单价 - core/agent_builder.py: build_agent 加 video_variant + cancel_check 形参, cancel_check 必须 build 阶段传 (SeedanceTool ctor 持有用于轮询) - web/app.py: GET /v1/video_models + MessageRequest.video_model + 透传 - web/static/dev.html: 顶栏第三下拉 (image 旁边) + state.videoModels/videoModel - skills/videogen/SKILL.md: 六维诊断 (运动+镜头替代 imagegen 的光线);BLOCKING 门槛比 imagegen 更严 (¥4 vs ¥0.22) + 等 30-90s 出片 - prompts/system/general_v1.md: 加 seedance 触发指引 (平行 seedream) phase 1 仅 t2v 文生视频,fast 上限 720p。API 端到端 smoke 跑过:路径/auth/错误解析全通,body schema 待用户在火山方舟控制台开通模型后真出片才能验。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:30:54 +08:00 · 2026-05-22 09:30:54 +08:00 · 7ff58c488e
parent 9a26e85da2
commit 7ff58c488e
11 changed files with 974 additions and 31 deletions
--- a/PROGRESS.md
+++ b/PROGRESS.md
@ -2,7 +2,7 @@

 > 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。

-最后更新:2026-05-21(sandbox 阻塞地位写进 DESIGN + 黑名单不再加强)
+最后更新:2026-05-22(豆包 Seedance 2.0 Fast 视频生成接入 + videogen skill)

 ---

@ -10,7 +10,7 @@

 | Phase | 标题 | 状态 | 备注 |
 |---|---|---|---|
-| 1-3 | 骨架 + Skill + run_python | ✅ | 三个 skill;CoreCoder 唯一匹配 edit;敏感 env 过滤 |
+| 1-3 | 骨架 + Skill + run_python | ✅ | 多 skill(coding/proposal/ppt/research/documents/imagegen/videogen);CoreCoder 唯一匹配 edit;敏感 env 过滤 |
 | 4 | 演化性能力 | 🟡 | Model Profile + Probing ✅;版本化 prompt 未做 |
 | 5 | Eval Suite | ⏸ 不做 | dogfooding 替代,probe 覆盖健康检查 |
 | 6 | 长任务工程化 | 🟡 | task + 恢复 ✅;双层记忆 ✅;context 压缩未做 |
@ -21,6 +21,10 @@

 ## 已完成关键能力

+### 2026-05-22
+
+- **豆包 Seedance 2.0 Fast 视频生成接入(文生视频)+ videogen skill**:`config/media/doubao.yaml` 展开 video 段(`seedance_2_fast`:¥37/Mtok 文生 / ¥22/Mtok 图生,实测档位 480p 5s ¥1.86 / 720p 5s ¥4.00 — 由 token 公式 `(in+out)×W×H×fps/1024` 反推校验通过);`tools/seedance.py` 走 ark POST `/contents/generations/tasks` → 5s 间隔轮询 → succeeded 后 download mp4 + .meta.json 落 `<wd>/videos/<ts>-<rand>.mp4`,失败/cancel 不计费;`core/storage/usage.py::record_video_usage` 多态 units snapshot(resolution/duration/ratio/fps/tokens/单价);`build_agent` 加 `video_variant` + `cancel_check` 形参 — cancel_check 必须在 build 阶段传(SeedanceTool ctor 持有用于轮询期间响应停止按钮,改了原"build 后赋 agent.cancel_check"的延迟绑定,web 入口同步迁移);`web/app.py` 加 `_list_video_variants` / `_resolve_video_model` / `GET /v1/video_models` / `MessageRequest.video_model` / `OptimizePromptRequest.video_model`;`dev.html` 顶栏第三下拉 + `state.videoModels/videoModel` + 发消息一起 POST。前端 chip / inline `<video>` / `extractMediaBanner` / `_categorize` 在前期工作里已为 seedance 留好脚手架,几乎不动。`skills/videogen/SKILL.md` 六维诊断把 imagegen 的"光线"换成"运动+镜头"两维(运动必填,否则应该走 seedream 而非 seedance —— 差 18 倍价钱);BLOCKING 门槛比 imagegen 更严(¥4 vs ¥0.22)且要等 30-90s,贴 prompt+参数+预计花费+预计等待四件套等明确确认。`general_v1.md` 加 seedance 触发指引(平行 seedream)。phase 1 仅 t2v,**不支持 i2v**(skill 明示告诉用户)。fast 上限 720p,1080p+ 留给 pro variant(yaml 当前未配)。否决:(a) progress 事件流化(需要给 tool 加 sink 注入,phase 1 用 `run_status=running` 够了);(b) 远端 cgt-task DELETE(Volcengine 无明确 API,best-effort 不动);(c) i2v phase 1 拉进来(要图片转 URL + UI 选已有图,延后)。
+
 ### 2026-05-21

 - **dev.html primary button hover 文字消失修复(`.primary:hover` 加 `background: var(--accent)`)**:`button:hover:not(:disabled)` 与 `button.primary:hover` 特异性同为 (0,2,1) 平手按源码序后者赢,但后者只声明了 `filter: brightness(1.08)` 没声明 `background`,导致 `background` fallback 到前者的 `var(--hover)` 浅灰,而 `color` 仍是 `.primary` 的白 —— 白字浅灰底视觉消失。修法守住 background = accent,brightness filter 在红底上正常提亮。"+ 新建任务" / "发送" 两个 primary 按钮 hover 体验回归。
--- a/RUN.md
+++ b/RUN.md
@ -2,7 +2,7 @@

 > 怎么把 zcbot 跑起来。env / 常用命令 / 故障兜底。设计看 `DESIGN.md`,进度看 `PROGRESS.md`。

-最后更新:2026-05-21(新增 documents skill 的 env:`DOCUMENT_SEARCH_API_KEY` / `DOCUMENT_SEARCH_URL`)
+最后更新:2026-05-22(seedance 视频生成接入 — 同 ARK_API_KEY,新增 videogen skill)

 ---

@ -14,7 +14,8 @@
  DEEPSEEK_API_KEY=sk-...
  # 用 GLM 的话再加一条;国际站 z.ai 用 ZAI_API_KEY,国内站 bigmodel.cn 用 ZHIPUAI_API_KEY(对应 config/models/glm.yaml 的 api_key_env 字段)
  ZHIPUAI_API_KEY=...
-  # 豆包(火山方舟)图像/视频生成:可选。设了就挂上 seedream tool(0.22 元/张);未设 tool 不出现
+  # 豆包(火山方舟)图像/视频生成:可选。设了同时挂 seedream tool(0.22 元/张)与 seedance tool
+  # (Seedance 2.0 Fast,文生视频,480p 4s ¥1.86 ~ 720p 15s ¥12+,异步等 30-90s);未设两个 tool 都不出现
  ARK_API_KEY=...
  # documents skill(内部知识库 document_search API):可选。设了 documents skill 才能用,未设调用立即抛 RuntimeError
  DOCUMENT_SEARCH_API_KEY=...
--- a/config/media/doubao.yaml
+++ b/config/media/doubao.yaml
@ -1,12 +1,14 @@
 # 豆包(火山方舟 Ark)媒体生成模型档案。
 #
-# 价格表 last_updated: 2026-05-20
+# 价格表 last_updated: 2026-05-22
 #   源: https://www.volcengine.com/docs/82379/1544106
+#       https://github.com/ArcReel/ArcReel(费用参考表)
 #   豆包调价时手动更新本文件 + 重启 web。历史 usage_events 自带 snapshot 不受影响
-#   (record_image_usage 把 price_cny_per_image 写进 units jsonb 列)。
+#   (record_image_usage / record_video_usage 把单价 snapshot 写进 units jsonb 列)。
 #
 # 接入方式:走 ark 原生 HTTP(litellm 不覆盖图像/视频),core/ark_client.py 封装统一调用。
-# image (seedream) 同步返 URL;video (seedance) 异步 task + polling — 本期仅落地 image。
+# image (seedream) 同步返 URL;video (seedance) 异步 task + polling,产物落
+# <wd>/videos/<ts>-<rand>.mp4。

 ark_api_key_env: ARK_API_KEY
 ark_base_url: https://ark.cn-beijing.volces.com/api/v3
@ -23,11 +25,35 @@ image:
    default_search: false           # web search 额外加价 ~¥0.05/张;默认关
    request_timeout_s: 60           # 出图慢于此判超时

-# video (seedance) 待 Phase 2:
-# video:
-#   seedance_2:
-#     model_id: doubao-seedance-2-0-260128
-#     ...
-#   seedance_2_fast:
-#     model_id: doubao-seedance-2-0-fast-260128
-#     ...
+video:
+  seedance_2_fast:
+    model_id: doubao-seedance-2-0-fast-260128
+    display_name: 豆包 Seedance 2.0 Fast
+    # 异步任务:POST /contents/generations/tasks 拿 cgt-xxx → 轮询 GET
+    # /contents/generations/tasks/<id> 直到 status=succeeded → 取 content.video_url
+    # (24h 内有效,本 tool 立刻 download 到本地)。
+    endpoint_submit: /contents/generations/tasks
+    endpoint_poll: /contents/generations/tasks   # 实际路径 = base + "/{cgt_id}"
+
+    # 计费(per-token,token = (in_dur+out_dur) × W × H × fps / 1024):
+    #   文生视频(无视频输入,本期主力路径): ¥37 / 百万 tokens
+    #   图生视频(有视频输入,phase 2): ¥22 / 百万 tokens
+    # 实测档位(fast, 5s, 文生视频, 24fps,源: ArcReel 费用参考表):
+    #   480p 16:9 → ¥1.86
+    #   720p 16:9 → ¥4.00
+    # tool 内部按 W×H×duration×fps/1024 估算 tokens × 单价 → cost_cny。响应里若带 usage
+    # 字段则覆盖估算(待豆包接口实际返回字段校准)。
+    price_cny_per_mtoken_text2video: 37.0
+    price_cny_per_mtoken_video2video: 22.0
+    fps: 24                        # token 估算用;豆包当前 24fps 固定
+
+    # 支持参数(POST body 字段)
+    default_resolution: 720p       # fast 上限,可选 480p / 720p
+    default_ratio: "16:9"          # 16:9 / 9:16 / 1:1 / 4:3 / 3:4 / 21:9 / adaptive
+    default_duration: 5            # 4-15s
+    default_watermark: false
+
+    # 轮询参数
+    request_timeout_s: 60          # submit POST 超时(异步,只是提交)
+    poll_interval_s: 5             # 单次 GET 间隔(秒);典型 30-90s 出片
+    poll_timeout_s: 600            # 总等待上限(10min)→ 超时返 [Error]
--- a/core/agent_builder.py
+++ b/core/agent_builder.py
@ -19,7 +19,7 @@ from __future__ import annotations

 from datetime import datetime
 from pathlib import Path
-from typing import Optional, Tuple
+from typing import Callable, Optional, Tuple
 from uuid import UUID, uuid4

 import yaml
@ -37,6 +37,7 @@ from core.storage import check_no_subtask
 from core.task import TaskState
 from tools.fs import EditTool, GlobTool, GrepTool, ReadTool, WriteTool
 from tools.run_python import RunPythonTool
+from tools.seedance import SeedanceTool
 from tools.seedream import SeedreamTool
 from tools.shell import ShellTool
 from tools.skill_tool import LoadSkillTool
@ -220,6 +221,8 @@ def build_agent(
    name: Optional[str] = None,
    working_dir: Optional[str] = None,
    image_variant: str = "",
+    video_variant: str = "",
+    cancel_check: Optional[Callable[[], bool]] = None,
 ) -> Tuple[AgentLoop, Session, str, TaskState, Path]:
    """返回 (agent, session, task_id_str, task_state, working_dir_path)。

@ -384,8 +387,39 @@ def build_agent(
            )
            tools[seedream_tool.name] = seedream_tool

+        # 视频 variant 选择(同上 image_variant 范式):video_variant 由 caller 传,
+        # 空 → 取 yaml 第一个 video variant。本 run 的 SeedanceTool 锁定该 variant。
+        # cancel_check 是 web 入口构造的 `lambda: broker.is_cancelled(task_id)` —— 轮询
+        # 期间(典型 30-90s)拿来响应用户停止按钮;远端 cgt 任务无 cancel API,best-effort 不动远端
+        video_cfg = (ark_cfg.raw.get("video") or {})
+        v_chosen_key, v_chosen_cfg = "", None
+        if video_variant:
+            v = video_cfg.get(video_variant)
+            if isinstance(v, dict):
+                v_chosen_key, v_chosen_cfg = video_variant, v
+        if v_chosen_cfg is None:
+            for variant_key, variant_cfg in video_cfg.items():
+                if isinstance(variant_cfg, dict):
+                    v_chosen_key, v_chosen_cfg = variant_key, variant_cfg
+                    break
+        if v_chosen_cfg is not None:
+            seedance_tool = SeedanceTool(
+                ark_cfg=ark_cfg,
+                video_variant_cfg=v_chosen_cfg,
+                variant_key=v_chosen_key,
+                working_dir=working_dir_path,
+                task_id=task_id,
+                user_id=uid,
+                base_dir=tool_base,
+                user_root=ur_path,
+                cancel_check=cancel_check,
+            )
+            tools[seedance_tool.name] = seedance_tool
+
    sink = ConsoleEventSink(console) if console else None
    agent = AgentLoop(llm, tools, session, caps, user_id=uid, sink=sink)
+    if cancel_check is not None:
+        agent.cancel_check = cancel_check
    return agent, session, sid, task_state, working_dir_path


--- a/core/storage/usage.py
+++ b/core/storage/usage.py
@ -130,3 +130,63 @@ def record_image_usage(
            cost_cny=cost_cny,
        ))
    return cost_cny
+
+
+def record_video_usage(
+    *,
+    task_id: UUID,
+    user_id: UUID,
+    model_profile: str,
+    resolution: str,
+    ratio: str,
+    duration_s: int,
+    fps: int,
+    width: int,
+    height: int,
+    tokens: int,
+    price_cny_per_mtoken: float,
+    has_video_input: bool = False,
+    watermark: bool = False,
+    extra_units: Optional[Mapping[str, Any]] = None,
+) -> Decimal:
+    """记一次视频生成:写 usage_events(kind=video)。
+
+    成本算法:`cost_cny = tokens / 1_000_000 * price_cny_per_mtoken`。tokens 由 caller
+    传入(响应里若有官方 usage 字段就用,否则按公式 `(in_dur+out_dur)*W*H*fps/1024`
+    估算),`price_cny_per_mtoken` 同步 snapshot 进 units jsonb,日后调价不影响历史对账。
+
+    `model_profile` 形如 `"doubao.seedance_2_fast"`(family.variant 风格)。
+    `has_video_input=True` 表示图生/视频编辑路径(单价 22 元/Mtok);False = 文生视频(37 元/Mtok)。
+    caller 自行根据请求 body 是否含 image_url/video_url 传对应单价 + flag。
+
+    **失败任务不要走这里** —— Volcengine 失败不计费,失败的 tool 调用直接返 [Error] 不写 usage。
+    """
+    price = Decimal(str(price_cny_per_mtoken))
+    tok = Decimal(str(int(tokens)))
+    cost_cny = (price * tok / Decimal("1000000")).quantize(Decimal("0.000001"))
+    units: dict[str, Any] = {
+        "resolution": resolution,
+        "ratio": ratio,
+        "duration_s": int(duration_s),
+        "fps": int(fps),
+        "width": int(width),
+        "height": int(height),
+        "tokens": int(tokens),
+        "price_cny_per_mtoken": float(price_cny_per_mtoken),
+        "has_video_input": bool(has_video_input),
+        "watermark": bool(watermark),
+    }
+    if extra_units:
+        units.update(extra_units)
+
+    with session_scope() as s:
+        s.add(UsageEvent(
+            user_id=user_id,
+            task_id=task_id,
+            message_id=None,
+            kind="video",
+            model_profile=model_profile,
+            units=units,
+            cost_cny=cost_cny,
+        ))
+    return cost_cny
--- a/prompts/system/general_v1.md
+++ b/prompts/system/general_v1.md
@ -11,6 +11,9 @@
 - `seedream` —— 豆包图像生成。产物自动落 `<task_dir>/figures/`。每次 **¥0.22**(联网 `search=true` 加 ¥0.05)。
  - **调用前必须先 `load_skill('imagegen')`** —— skill 里有「何时该用 / 该不该用 mermaid 替代 / 用户描述模糊度诊断 / 一次性追问范式 / prompt 装配 / 失败解药」全套引导。**不要拿用户原话直接当 prompt 调 tool** —— 容易烧 ¥0.22 在错的方向上。
  - 兜底硬约束(即使没 load skill 也守):用户没主动要图就别装饰性生成;同一目的不满意**不要连发**,先口头校准 prompt 再调。
+- `seedance` —— 豆包视频生成(Seedance 2.0 Fast)。异步任务,**等 30-90s 出片**;产物自动落 `<task_dir>/videos/`。每次 **¥1.86 起**(480p 4s)~ **¥12+**(720p 15s),比图贵 10 倍以上。触发词:视频 / 动画 / 动起来 / 做个 video / 镜头 / 短片 / 演示视频 / 动效。
+  - **调用前必须先 `load_skill('videogen')`** —— skill 里有「6 维诊断(含运动维必填)/ seedream/mermaid 反向选型 / prompt 装配 / 参数取舍(时长/分辨率/比例直接决定钱)/ 失败解药」全套引导。视频比图贵 10 倍且 90s 等待,绝对不要拿用户原话当 prompt 直接调。
+  - 兜底硬约束:用户没主动要视频就别装饰性生成(比生图更严重的红线);同一目的不满意**绝不连发**(1 次错 = ¥4+60s,连发 2 次 = ¥8+2min);phase 1 仅文生视频,**不支持** image-to-video / video-to-video。

 ## Skill 机制
 你启动时只看到下方 skill 的"名字 + 描述"。Skill 是**可选辅助** —— 任务明确落在
--- a/scripts/smoke_seedance.py
+++ b/scripts/smoke_seedance.py
@ -0,0 +1,121 @@
+"""Smoke: 豆包 Seedance 视频生成 tool 端到端走通。
+
+跑法: .venv/Scripts/python.exe scripts/smoke_seedance.py
+依赖 .env 里 ARK_API_KEY / ZCBOT_DB_URL。**会真的调豆包 API,产生 ~¥1.86 (480p 5s) 费用 + 等 30-90s**。
+
+校验:
+  1. ArkConfig.load() 拿到 cfg + video.seedance_2_fast 存在
+  2. SeedanceTool.execute(prompt=..., resolution='480p', duration=4) 返回 [seedance ...] 文案
+  3. videos/<ts>-<rand>.mp4 落盘且大于 0 字节
+  4. 同名 .meta.json 存在 + 含 prompt/model_id/cost_cny/tokens/cgt_id 字段
+  5. usage_events 多出一行 kind="video",单价 + 分辨率 + 时长 snapshot 在 units jsonb
+"""
+from __future__ import annotations
+
+import json
+import os
+import sys
+import uuid
+from pathlib import Path
+
+ROOT = Path(__file__).resolve().parent.parent
+sys.path.insert(0, str(ROOT))
+
+env_file = ROOT / ".env"
+if env_file.exists():
+    for line in env_file.read_text(encoding="utf-8").splitlines():
+        line = line.strip()
+        if not line or line.startswith("#") or "=" not in line:
+            continue
+        k, _, v = line.partition("=")
+        os.environ.setdefault(k.strip(), v.strip())
+
+from sqlalchemy import text
+
+from core.ark_client import ArkConfig
+from core.storage import session_scope
+from core.storage.models import Task, User
+from tools.seedance import SeedanceTool
+
+
+def main() -> int:
+    cfg = ArkConfig.load()
+    if cfg is None:
+        print("[SKIP] ARK_API_KEY 未设(或 config/media/doubao.yaml 缺失),无法测真接口")
+        return 0
+
+    video_cfg = (cfg.raw.get("video") or {})
+    if not video_cfg:
+        print("[SKIP] doubao.yaml 无 video 段")
+        return 0
+    variant_key, variant_cfg = next(iter(video_cfg.items()))
+    print(f"[setup] variant={variant_key} model={variant_cfg.get('model_id')}")
+    print(f"        price_text2video=¥{variant_cfg.get('price_cny_per_mtoken_text2video')}/Mtok")
+
+    uid = uuid.uuid4()
+    tid = uuid.uuid4()
+    ws_user = ROOT / "workspace" / "users" / str(uid)
+    wd = ws_user / "smoke_seedance"
+    wd.mkdir(parents=True, exist_ok=True)
+    # 拆两个事务:models 未定 relationship,UOW 不知 User→Task FK 依赖,同一事务里 Task 会先插炸 FK
+    with session_scope() as s:
+        s.add(User(user_id=uid))
+    with session_scope() as s:
+        s.add(Task(task_id=tid, user_id=uid, name="smoke_seedance", working_dir=str(wd)))
+
+    tool = SeedanceTool(
+        ark_cfg=cfg,
+        video_variant_cfg=variant_cfg,
+        variant_key=variant_key,
+        working_dir=wd,
+        task_id=tid,
+        user_id=uid,
+        base_dir=wd,
+        user_root=ws_user,
+    )
+
+    # 选最便宜档位省钱:480p / 4s / 16:9 → ~¥1.5
+    prompt = "一只橙色的小猫从窗台跳下来,水彩风格,镜头平视跟随"
+    print(f"[call] prompt={prompt!r} resolution=480p duration=4")
+    print(f"       (异步任务,等 30-90s 出片,先 submit 后轮询)")
+    result = tool.execute(prompt=prompt, resolution="480p", duration=4, ratio="16:9")
+    print(f"[tool result]\n{result}\n")
+
+    if result.startswith("[Error]") or result.startswith("[Cancelled]"):
+        print(f"[FAIL] tool 返回 error/cancelled")
+        return 2
+
+    mp4s = list((wd / "videos").glob("*.mp4"))
+    assert len(mp4s) == 1, f"videos/*.mp4 应当 1 个,实际 {len(mp4s)}"
+    mp4 = mp4s[0]
+    assert mp4.stat().st_size > 0, f"{mp4} 大小为 0"
+    print(f"[OK] mp4 落盘 {mp4.name} ({mp4.stat().st_size} bytes)")
+
+    meta_path = mp4.with_suffix(".meta.json")
+    assert meta_path.exists(), f"meta 文件不存在 {meta_path}"
+    meta = json.loads(meta_path.read_text(encoding="utf-8"))
+    for k in ("prompt", "model_id", "resolution", "duration_s", "tokens", "cost_cny", "cgt_id", "ts"):
+        assert k in meta, f"meta 缺字段 {k}"
+    print(f"[OK] meta 字段齐全: {list(meta.keys())}")
+    print(f"     tokens={meta['tokens']} cost=¥{meta['cost_cny']} cgt={meta['cgt_id']}")
+
+    with session_scope() as s:
+        rows = s.execute(text(
+            "SELECT kind, model_profile, units, cost_cny FROM usage_events "
+            "WHERE task_id = :tid"
+        ), {"tid": str(tid)}).all()
+    assert len(rows) == 1, f"usage_events 行数应 1,实际 {len(rows)}"
+    row = rows[0]
+    assert row.kind == "video", f"kind 应 video,实际 {row.kind}"
+    assert row.model_profile == f"doubao.{variant_key}", f"model_profile = {row.model_profile}"
+    for k in ("resolution", "duration_s", "tokens", "price_cny_per_mtoken"):
+        assert k in row.units, f"units 缺 {k} snapshot"
+    print(f"[OK] usage_events: kind={row.kind} model={row.model_profile} cost_cny={row.cost_cny}")
+    print(f"     units snapshot: {row.units}")
+
+    print("\n[PASS] smoke_seedance 全部通过")
+    return 0
+
+
+if __name__ == "__main__":
+    sys.exit(main())
--- a/skills/videogen/SKILL.md
+++ b/skills/videogen/SKILL.md
@ -0,0 +1,243 @@
+---
+name: videogen
+description: 用豆包 Seedance 2.0 Fast 生视频(`seedance` tool)。**任何生视频任务调 tool 前必须 load 本 skill**。触发词:视频 / 动画 / 动起来 / 做个 video / 做段视频 / 出段视频 / 生成视频 / mov / mp4 / 短片 / 镜头 / 运动镜头 / 演示视频 / 动效。核心:视频比图贵 10 倍(¥1.86-¥4+ / 段),且要等 30-90s,问清楚再画 + 强制确认。
+---
+
+# Videogen
+
+把"我想要个视频"变成一段能用的视频。流程:
+
+**诊断模糊度(6 维) → 一次性给推断 + 待确认项 → 用户拍板维度 → 装配最终 prompt + 参数 → ⛔ 把 prompt+参数完整贴给用户看 + 问改不改 → 用户明确确认后 → 调 `seedance`**
+
+每次 `seedance` 调用 **¥1.86 起**(480p 4s)→ **¥4.00 起**(720p 5s)→ **¥12+**(720p 15s),且要等 **30-90 秒**才出片。**贵 + 慢 = 试错代价高,必须问清楚再调**。
+
+## ⛔ 调 tool 前的强制门(铁律,比 imagegen 更严格)
+
+**任何情况下,在 `seedance` tool call 发出去之前,都必须先把最终装配好的 prompt(包括 resolution / ratio / duration / watermark 参数)用对话消息明文展示给用户,然后问"这样画?要改什么?"并 ⛔ BLOCKING 等用户明确回复。**
+
+- 用户回复 "可以" / "OK" / "就这样" / "对" / "嗯" / "画吧" / "出片吧" / 简单确认 → 可以调 tool
+- 用户回复 "把 X 改成 Y" / "时长改 8s" / "镜头换成 Y" → 改完后**再次贴最终 prompt + 再次等确认**
+- 用户**沉默 / 长时间不回 / 追问别的事** → 不算确认,**继续等**,不要自作主张
+- 用户回 "看起来不错" / "差不多" / 模棱两可 → **主动追问一句"这就开烧 ¥X?"**,拿到明确"是"再调
+
+**为什么比 imagegen 还严**:视频单价 ¥4 起,比图贵 10 倍以上;一次失败相当于 18 张错图;且要等 30-90s,用户改方向的等待代价也很高。**装配 prompt 不等于授权调用** —— 装配是模型脑内运算,授权要落到用户的"嗯,画吧"上。
+
+## 何时用本 skill
+
+- 用户**明确说**"做个视频" / "出段视频" / "动画" / "动起来" / "镜头扫过" / "演示视频"
+- 用户原本要 ppt / 海报 / 申报书,**主动**问能不能配段视频(此时介绍价格 + 时长引导决策)
+- 用户拿到 seedream 生的静态图后说"想动起来" / "加点运动" — 注意 phase 1 只支持 t2v 文生视频,**不支持**从已有图生视频(i2v),要告诉用户这点
+
+## 何时不走本 skill
+
+- 用户**没主动要视频**(别为"丰富回复"装饰性生视频 —— 比图更严重的浪费红线)
+- 用户要的是**流程/结构动效**(节点-箭头-步骤逐次出现)→ 这是 ppt 动画 / mermaid + ppt 转场的事,不是视频
+- 用户要的是**实拍素材** / **已有视频剪辑** → seedance 是 AI 生成,不是素材库;告诉用户走 unsplash / pexels 等
+- 用户**有具体参考视频说"按这个改"** → phase 1 不支持 i2v / v2v,告诉用户先用文字描述
+
+## 关键岔路:seedream vs seedance vs mermaid
+
+**默认倾向 seedream / mermaid**(静态够用就别上动态):
+- 用户要的是**单张图**(封面 / 配图 / 示意图)→ seedream
+- 用户要的是**结构/流程**(节点关系 / 时序 / 架构)→ mermaid
+
+**反向选 seedance**(满足任一):
+- 用户**明确说**视频/动画/动起来/做个 video(而非"画一张" / "做个图")
+- 内容本身**离不开运动**:水流 / 火焰 / 旋转 / 镜头扫过场景 / 角色动作
+- ppt 引子页要**视觉冲击 + 时间维度**(2-3s 短动画开场)
+- 用户先做了静态图,**明确**说"想动起来" / "加运动感"
+
+**模糊时主动问一句**:
+
+> 你这是想要 **一张静态图**(seedream,¥0.22,3-5 秒出图),还是 **一段短视频**(seedance,¥4 起,等 30-90s)?静态图够的话省钱省时间。
+
+## 诊断模糊度 — 六维清单(运动是新增的必填维)
+
+| 维度 | 缺失信号 |
+|---|---|
+| **主体** What | "做个混凝土视频" → 视频里有什么?试块 / 楼板 / 工程现场 / 微观裂纹? |
+| **运动** Motion ⭐ | **视频特有,必问** — 主体在做什么?(浇筑 / 凝固 / 裂开 / 旋转 / 镜头扫过)。**没运动 = 应该走 seedream**。 |
+| **场景** Where | 工地 / 实验室 / 抽象空间 / 极简白底? |
+| **镜头** Camera | 镜头怎么动?(固定 / 跟随 / 推近 / 拉远 / 环绕 / 俯视下降)。AI 视频对镜头描述很敏感,不写默认随机晃动 |
+| **风格** Style | 写实摄影 / 工业 CG / 扁平 2D 动画 / 水墨 / 赛博朋克 / 学术示意? |
+| **时长 + 分辨率 + 比例** | **时长**:4-15s,默认 5s(短=便宜)。**分辨率**:fast 仅 480p/720p,默认 720p。**比例**:ppt 16:9 / 短视频 9:16 / 头像 1:1。 |
+
+**评估规则**:
+- 6 维填齐 / 缺 ≤1 → 可以直接装配 prompt 调用
+- **运动维不能省** —— 没运动就劝退到 seedream(¥0.22 vs ¥4,差 18 倍)
+- 缺 2 维及以上 → **先问再画**
+- 时长、分辨率、比例三选一缺即追问;**没说时长就默认 5s 是错的**,要问"4s 短演示 / 5s 默认 / 8s 中等 / 12s+ 长镜头?",时长直接决定钱:`cost ≈ ¥4 × duration/5 × resolution_factor`
+
+## 一次性给推断 + 待确认项(不要一个个问)
+
+模糊时一次摆出推断,让用户改或确认:
+
+> 你说"做个混凝土浇筑的视频",我打算这样画:
+> - **主体**:工地上正在浇筑混凝土的楼板
+> - **运动**:混凝土从泵车软管流出注入模板,工人手持振动棒来回插入
+> - **场景**:工地中景,远处有塔吊
+> - **镜头**:固定俯视 + 缓慢推近模板中心
+> - **风格**:写实工程纪录片
+> - **时长 / 分辨率 / 比例**:5s / 720p / 16:9(ppt 用) — ¥4.00,等 30-90s
+>
+> 这样画可以吗?或者告诉我:想突出什么?(浇筑工艺细节 / 振动棒动作 / 大场面气势 / 慢动作)
+
+## 装配 prompt — 用户拍板维度后
+
+把前 5 维拼成自然中文 prompt 文本(运动 / 镜头是新增重点),**时长 / 分辨率 / 比例走参数**:
+
+```
+工地上正在浇筑混凝土的楼板,混凝土从泵车软管流出注入模板,
+工人手持振动棒来回插入,固定俯视镜头缓慢推近模板中心,
+写实工程纪录片风格,正午阳光
+```
+
+要点:
+- **运动描述具体** —— "流动 / 旋转 / 推近 / 倾倒",不要"动起来"这种无信息词
+- **镜头单独成句** —— "固定俯视 / 缓慢推近 / 环绕一周 / 跟随主体平移";不写默认镜头随机晃动
+- **不要堆形容词** —— 模型理解动作 > 形容词
+- **不要写否定** —— "no shaking, not blurry" 反向起效;Seedance 不支持 negative prompt
+- 主体放最前,镜头 / 风格放后
+
+## ⛔ 调 tool 前再过一道:把最终 prompt + 参数贴给用户
+
+装配完后**立即**用对话消息把最终结果贴出来 + 等用户明确确认。**装配 ≠ 授权调用**。
+
+格式建议(代码块包起来,清晰可读):
+
+> 我准备调 seedance,参数如下:
+>
+> ````
+> prompt: 工地上正在浇筑混凝土的楼板,混凝土从泵车软管流出注入模板,
+>         工人手持振动棒来回插入,固定俯视镜头缓慢推近模板中心,
+>         写实工程纪录片风格,正午阳光
+> resolution: 720p
+> ratio: 16:9(ppt 横版)
+> duration: 5 秒
+> watermark: false
+> 预计花费: ¥4.00
+> 预计等待: 30-90 秒
+> ````
+>
+> 这样开烧?要改什么?(改 prompt 文字 / 改时长 / 换比例 / 降到 480p 省钱)
+
+然后 ⛔ **BLOCKING:等用户明确回复**。
+
+**为什么强调** —— ¥4 一段视频,用户没机会在调用前看清楚 prompt 的话,生成后只能事后看视频反推哪句话错了,改一次又是 ¥4 + 1.5 分钟。一次对话往返(免费)避免一次错片(¥4 + 90s 等待),是最划算的强制门。
+
+## 参数取舍
+
+### `resolution`(分辨率)
+
+| 选项 | 像素 (16:9) | 适用场景 | 与 720p 价格比 |
+|---|---|---|---|
+| `480p` | 854×480 | 短演示 / 邮件附件 / 低带宽预览 / **省钱档** | ~0.47x(便宜一半) |
+| `720p`(默认) | 1280×720 | ppt 嵌入 / 公众号 / 一般演示 | 1.0x(¥4 / 5s) |
+| `1080p` | 1920×1080 | **fast 不支持**,会报 [Error];仅 pro 可用 | — |
+
+**默认 720p 是合理的"够用挡"**;**预算紧 / 试效果**用 480p;**fast 上限就是 720p**,用户要 1080p+ 要切到 pro variant(yaml 里没配 pro 时直接告诉用户)。
+
+### `duration`(时长)
+
+| 时长 | 适用 | 720p 16:9 估价 |
+|---|---|---|
+| 4s | 最短,纯展示 / 转场 / loop | ~¥3.20 |
+| 5s(默认) | 标准短视频 / ppt 引子 | ~¥4.00 |
+| 8s | 中等叙事 / 完整动作 | ~¥6.40 |
+| 12s | 长镜头 / 复杂场景 | ~¥9.60 |
+| 15s | 上限 / 完整 plot | ~¥12.00 |
+
+**线性扩展**(每秒 ~¥0.8 @ 720p);**短=便宜=快**,无明确叙事需求默认 5s。**4s 是最低,< 4 报错**。
+
+### `ratio`(比例)
+
+| 比例 | 适用 |
+|---|---|
+| `16:9`(默认) | ppt / 横版演示 / 一般用途 |
+| `9:16` | 短视频 / 抖音 / 小红书 / 手机海报 |
+| `1:1` | 社交头像 / Instagram |
+| `4:3` | 老电视 / 复古 |
+| `3:4` | 杂志竖版 |
+| `21:9` | 电影超宽 / Banner |
+| `adaptive` | 模型自己选 |
+
+**默认 16:9** 适合 90% ppt / 申报 / 公众号场景。**比例错了后期裁切很难还原构图** —— 不问用途默认 16:9 也比默认 1:1 强(seedream 那边有 1:1 误区,这里别犯)。
+
+### `watermark`
+
+| 默认 | 何时改 |
+|---|---|
+| `false` | 默认无水印(申报 / ppt / 客户交付都不该带);仅当用户明确说"加水印"才传 `true` |
+
+## 调用范式
+
+**前置条件**:用户已经看过最终 prompt + 所有参数,明确回复"可以" / "OK" / "出片吧" 之类。**没看到这个确认就不要调**。
+
+`seedance` 是 tool 不是 skill 函数,**直接调,不要 `run_python` 包一层**:
+
+```
+seedance(
+    prompt="工地上正在浇筑混凝土的楼板,混凝土从泵车软管流出注入模板,工人手持振动棒来回插入,固定俯视镜头缓慢推近模板中心,写实工程纪录片风格,正午阳光",
+    resolution="720p",     # 可省,走默认
+    ratio="16:9",          # 可省
+    duration=5,            # 可省
+    watermark=false,       # 可省
+)
+```
+
+**调用是同步阻塞 30-90s** —— tool 内部 submit 后轮询直到 succeeded,期间 LLM 卡住。这不是 bug,告诉用户"提交了,等 30-90 秒"再耐心等返回。
+
+返回串首行是 `[seedance] model=... · resolution=... · ratio=... · duration=Xs · cost=¥... · elapsed=...s` —— 原样保留给用户(SPA 会 parse 挂徽章)。第二行 `saved: <相对路径>` 是产物路径,告诉用户。
+
+产物自动落 `<task_dir>/videos/<时间戳>-<rand>.mp4` + 同名 `.meta.json`(prompt / 参数 / cost / tokens / cgt_id 全 snapshot)。
+
+## 失败 / 不满意后怎么办
+
+**不要原 prompt 重发**!那是 ¥4 一发的浪费。
+
+| 现象 | 原因 | 解药 |
+|---|---|---|
+| 主体动作不对 | 运动描述太抽象("动起来"→ 无信息) | 用具体动词:"从软管流出 / 来回插入 / 缓慢上升" |
+| 镜头不稳 / 随机晃动 | 没指定镜头 | 加"固定镜头" / "缓慢推近" / "环绕一周";尤其要明确镜头是否固定 |
+| 主体被切掉 | 比例与构图不匹配 | 改 ratio 或在 prompt 里写"全景 / 中景 / 主体居中" |
+| 风格不对 | 风格词位置太靠后 / 缺少 | 风格词放 prompt 前 1/3:"写实工程纪录片风格,xxx" |
+| 时长不够讲完动作 | duration 太短 | 4s 适合单一动作,多步骤动作至少 8s |
+| `[Error] seedance API` 报错(传 1080p 时) | fast 不支持 1080p+ | 降到 720p 重试;或换 pro variant(yaml 加配置) |
+| `[Error] seedance 轮询超时` | 队列拥堵 / 服务异常 | 等几分钟再试;cgt_id 在 24h 内仍可手工 GET 拉结果 |
+| `[Cancelled]` | 用户点了停止 | **告诉用户:远端任务可能仍在跑,Volcengine 失败/成功才计费,若仍出片可能产生费用** |
+| 出现奇怪文字 | Seedance 文字渲染不稳 | prompt 里不要要求画文字 / 字幕;字幕后期 ppt 加 |
+
+**先口头跟用户对齐改哪一维**,再发新调用 —— 不要"再画一段试试"连续烧钱。
+
+## 产物处理
+
+生视频完成后:
+- 把 `saved: videos/xxx.mp4` 路径告诉用户(SPA 会自动 inline 播放器,点击可全屏)
+- 如果是做 ppt,提醒用户 `python-pptx` 可以 `add_movie` 嵌入 mp4(但导出 pdf 时视频会丢,改截关键帧 + 链接)
+- 用户说"换一段" → 走上节的"对齐改哪一维"流程,**不要默认重发**
+- 用户说"再来几段备选" → **先确认**:"备选 N 段会花 ¥{4 × N} + 等 {N × 60}s,确认?"
+
+## 反模式
+
+- ❌ **没把最终 prompt+参数贴给用户看就直接 tool call** —— 比 imagegen 更严格的铁律(¥4 vs ¥0.22)
+- ❌ **用户回"看起来不错"就当确认调 tool** —— 模棱两可必须追问到明确"是 / OK / 画吧"
+- ❌ **运动维度跳过**(=应该走 seedream 却走 seedance) —— 浪费 18 倍价钱
+- ❌ 用户没主动要视频就装饰性生成
+- ❌ 用户说"做个动画" 直接拿这 4 个字当 prompt 调用 —— 六维清单至少先问 2 轮
+- ❌ 一个个问"主体?""镜头?" —— 一次性摆推断 + 让用户改
+- ❌ **不问时长直接默认 5s** —— 5s 是合理默认但要让用户知道有 4-15s 区间;长视频(>8s)开调用前必须强调成本
+- ❌ **不问分辨率上 720p** —— 试效果阶段先 480p(便宜一半)
+- ❌ 同一目的不满意连续重发 —— 1 次错 = ¥4 + 60s,2 次连发 = ¥8 + 2min,先校准 prompt 再调
+- ❌ prompt 写"动起来 / 有动感" —— Seedance 不靠形容词,要具体动词
+- ❌ prompt 里写否定 "no shaking, not blurry" —— Seedance 不支持 negative,反向起效
+- ❌ 让用户在 seedream / seedance 之间默默替他决定 —— 模糊就一句话问明白
+- ❌ phase 1 拿用户已生成的图试图 i2v —— **当前不支持**,明确告诉用户
+- ❌ 用 `run_python` 调 `requests` 裸打豆包 API —— 走 `seedance` tool(已封装异步轮询 + 计费 + 落盘 + meta + cancel)
+
+## 输出
+
+调完告诉用户:
+- 文件相对路径(`videos/xxx.mp4`)
+- 本次成本(¥X.XX,从 banner 抽)
+- 用的 prompt 摘要(主体 / 运动 / 镜头 3 件套)
+- 一句"要换方向 / 调镜头 / 加长时长再来一段吗?"
--- a/tools/seedance.py
+++ b/tools/seedance.py
@ -0,0 +1,342 @@
+"""seedance: 调豆包 Seedance 2.0 Fast 视频生成 API,产物落 working_dir/videos/。
+
+异步任务:
+  1. POST /contents/generations/tasks → 返 `{"id": "cgt-..."}`
+  2. 轮询 GET /contents/generations/tasks/<cgt_id>(默认 5s 间隔,最长 10min)
+     直到 status ∈ {succeeded, failed, expired, cancelled}
+  3. succeeded → 取 content.video_url → download 到本地 + 写 meta + 计 usage_events
+
+模型 ID + 单价 + 默认参数全在 `config/media/doubao.yaml`,本 tool 只装配。
+计费按 token 公式 `(in_dur+out_dur) × W × H × fps / 1024`,文生视频 in_dur=0;
+W×H 由 resolution + ratio 推算(横版 height=resolution_num,竖版 width=resolution_num)。
+
+完成后:
+- 视频落到 `<wd>/videos/<YYYYMMDD-HHMMSS>-<rand6>.mp4`
+- 同名 `.meta.json` 写 prompt / model / 参数 / cost_cny / tokens / cgt_id / ts
+- usage_events 写 kind="video" 一行(单价 + 分辨率 + 时长 snapshot 进 units)
+"""
+from __future__ import annotations
+
+import json
+import secrets
+import time
+from datetime import datetime
+from pathlib import Path
+from typing import Any, Callable, Optional
+from uuid import UUID
+
+from core.ark_client import ArkClient, ArkConfig, ArkError
+from core.storage.usage import record_video_usage
+
+from .base import Tool
+
+
+# resolution → 短边像素;W/H 实际由 ratio 决定(横版短边=H,竖版短边=W)
+_RESOLUTION_TO_SHORT_EDGE: dict[str, int] = {
+    "480p": 480,
+    "720p": 720,
+    "1080p": 1080,   # fast 不支持,留给 pro;tool 不在这层挡,让豆包侧拒绝并返 [Error]
+}
+
+_VALID_RATIOS: set[str] = {"16:9", "9:16", "1:1", "4:3", "3:4", "21:9", "adaptive"}
+
+
+def _resolve_dimensions(resolution: str, ratio: str) -> tuple[int, int]:
+    """resolution(短边)+ ratio → (width, height) 像素值。
+
+    adaptive 比例下无法预知 W/H,退回正方形按短边算(仅供 token 估算,实际 W/H 由豆包决定)。
+    未知 resolution → 默认 720p。
+    """
+    short = _RESOLUTION_TO_SHORT_EDGE.get(resolution, 720)
+    if ratio == "adaptive" or ratio not in _VALID_RATIOS:
+        return short, short
+    num_str, den_str = ratio.split(":", 1)
+    num, den = int(num_str), int(den_str)
+    if num >= den:   # 横版或方形:height 是短边
+        return round(short * num / den), short
+    # 竖版:width 是短边
+    return short, round(short * den / num)
+
+
+def _estimate_tokens(width: int, height: int, duration_s: int, fps: int, in_dur_s: int = 0) -> int:
+    """火山方舟 token 估算公式:`(in_dur + out_dur) × W × H × fps / 1024`。
+
+    实测校验(fast, 720p 16:9, 5s, 24fps, 文生视频):
+      (0+5) × 1280 × 720 × 24 / 1024 = 108,000 tokens × ¥37/Mtok = ¥3.996 ≈ ¥4.00 ✓
+    """
+    return int((in_dur_s + duration_s) * width * height * fps / 1024)
+
+
+class SeedanceTool(Tool):
+    name = "seedance"
+    description = (
+        "Generate a short video with Doubao Seedance 2.0 Fast and save to working_dir/videos/. "
+        "Use only when the user explicitly asks for a video / 视频 / 动画 / 动起来. "
+        "Async: takes 30-90s to render. Costs ~¥1.86 (480p, 5s) ~ ¥4.00 (720p, 5s); "
+        "longer / higher-resolution scales up. Returns the saved relative path."
+    )
+    parameters = {
+        "type": "object",
+        "properties": {
+            "prompt": {
+                "type": "string",
+                "description": "中文或英文都行,详尽描述画面 + 运动 + 镜头(主体在做什么 / 镜头怎么动 / 场景 / 风格)。",
+            },
+            "resolution": {
+                "type": "string",
+                "description": "Video resolution. fast 版仅支持 '480p' / '720p'(默认 720p);1080p+ 仅 pro 可用。",
+                "enum": ["480p", "720p", "1080p"],
+            },
+            "ratio": {
+                "type": "string",
+                "description": "Aspect ratio. 默认 '16:9'(ppt/横屏配视频)。竖版海报/短视频用 '9:16'。",
+                "enum": ["16:9", "9:16", "1:1", "4:3", "3:4", "21:9", "adaptive"],
+            },
+            "duration": {
+                "type": "integer",
+                "description": "视频时长(秒),范围 4-15。短=便宜,默认 5。每加 1s 成本约线性上升。",
+                "minimum": 4,
+                "maximum": 15,
+            },
+            "watermark": {
+                "type": "boolean",
+                "description": "是否打豆包水印。默认 false(申报/PPT 场景反需求)。",
+            },
+        },
+        "required": ["prompt"],
+    }
+
+    def __init__(
+        self,
+        *,
+        ark_cfg: ArkConfig,
+        video_variant_cfg: dict,
+        variant_key: str,
+        working_dir: Path,
+        task_id: UUID,
+        user_id: UUID,
+        base_dir: Optional[Path] = None,
+        user_root: Optional[Path] = None,
+        cancel_check: Optional[Callable[[], bool]] = None,
+    ) -> None:
+        super().__init__(base_dir, user_root=user_root)
+        self.ark_cfg = ark_cfg
+        self.cfg = video_variant_cfg
+        self.variant_key = variant_key  # 'seedance_2_fast' → usage_events.model_profile = "doubao.seedance_2_fast"
+        self.working_dir = Path(working_dir)
+        self.task_id = task_id
+        self.user_id = user_id
+        self.cancel_check = cancel_check  # 轮询期间检查是否被 cancel
+
+    def execute(
+        self,
+        prompt: str,
+        resolution: Optional[str] = None,
+        ratio: Optional[str] = None,
+        duration: Optional[int] = None,
+        watermark: Optional[bool] = None,
+    ) -> str:
+        if not (prompt or "").strip():
+            return "[Error] prompt 不能为空"
+
+        cfg = self.cfg
+        model_id = cfg["model_id"]
+        chosen_resolution = resolution or cfg.get("default_resolution", "720p")
+        chosen_ratio = ratio or cfg.get("default_ratio", "16:9")
+        chosen_duration = int(duration) if duration is not None else int(cfg.get("default_duration", 5))
+        chosen_watermark = bool(cfg.get("default_watermark", False)) if watermark is None else bool(watermark)
+        fps = int(cfg.get("fps", 24))
+
+        submit_timeout = float(cfg.get("request_timeout_s", 60))
+        poll_interval = float(cfg.get("poll_interval_s", 5))
+        poll_timeout = float(cfg.get("poll_timeout_s", 600))
+        price_t2v = float(cfg.get("price_cny_per_mtoken_text2video", 37.0))
+
+        submit_endpoint = cfg.get("endpoint_submit", "/contents/generations/tasks")
+        poll_endpoint_base = cfg.get("endpoint_poll", "/contents/generations/tasks")
+
+        body: dict[str, Any] = {
+            "model": model_id,
+            "content": [{"type": "text", "text": prompt}],
+            "ratio": chosen_ratio,
+            "resolution": chosen_resolution,
+            "duration": chosen_duration,
+            "watermark": chosen_watermark,
+        }
+
+        t0 = time.monotonic()
+        try:
+            with ArkClient(self.ark_cfg, timeout_s=submit_timeout) as client:
+                # 1. submit
+                submit_resp = client.post_json(submit_endpoint, body, timeout_s=submit_timeout)
+                cgt_id = self._extract_task_id(submit_resp)
+                if not cgt_id:
+                    return f"[Error] seedance submit 响应缺 task id: {json.dumps(submit_resp, ensure_ascii=False)[:300]}"
+
+                # 2. poll
+                poll_url = f"{poll_endpoint_base}/{cgt_id}"
+                deadline = time.monotonic() + poll_timeout
+                last_status = ""
+                final_resp: dict = {}
+                while True:
+                    if self.cancel_check is not None and self.cancel_check():
+                        return (
+                            f"[Cancelled] seedance task {cgt_id} 用户取消(远端任务可能仍在跑;"
+                            f"Volcengine 失败/成功才计费,若仍出片可能产生 ~¥{self._rough_cost(chosen_resolution, chosen_ratio, chosen_duration, fps, price_t2v):.2f})"
+                        )
+                    if time.monotonic() > deadline:
+                        return (
+                            f"[Error] seedance 轮询超时(>{poll_timeout:.0f}s),最后 status={last_status!r},"
+                            f"cgt_id={cgt_id}(24h 内可手工 GET {poll_url} 拉结果)"
+                        )
+                    time.sleep(poll_interval)
+                    poll_resp = client.get_json(poll_url, timeout_s=submit_timeout)
+                    last_status = str(poll_resp.get("status") or "").lower()
+                    final_resp = poll_resp
+                    if last_status in ("succeeded", "failed", "expired", "cancelled"):
+                        break
+
+                if last_status != "succeeded":
+                    err = (final_resp.get("error") or {}) if isinstance(final_resp.get("error"), dict) else {}
+                    msg = err.get("message") or final_resp.get("message") or "(无错误描述)"
+                    return f"[Error] seedance task {cgt_id} 终态 status={last_status},msg={msg}"
+
+                video_url = self._extract_video_url(final_resp)
+                if not video_url:
+                    return f"[Error] seedance succeeded 但响应缺 video url: {json.dumps(final_resp, ensure_ascii=False)[:400]}"
+
+                # 3. download
+                ts = datetime.now().strftime("%Y%m%d-%H%M%S")
+                short = secrets.token_hex(3)
+                videos_dir = self.working_dir / "videos"
+                dest_mp4 = videos_dir / f"{ts}-{short}.mp4"
+                client.download(video_url, dest_mp4, timeout_s=300.0)
+        except ArkError as e:
+            return f"[Error] seedance API: {e}"
+
+        elapsed = time.monotonic() - t0
+
+        # tokens / cost:优先用响应里 usage 字段(若豆包返了),否则按公式估算
+        width, height = _resolve_dimensions(chosen_resolution, chosen_ratio)
+        tokens_estimated = _estimate_tokens(width, height, chosen_duration, fps, in_dur_s=0)
+        tokens_actual = self._extract_tokens(final_resp) or tokens_estimated
+        cost_cny = tokens_actual * price_t2v / 1_000_000.0
+
+        meta = {
+            "prompt": prompt,
+            "model_id": model_id,
+            "resolution": chosen_resolution,
+            "ratio": chosen_ratio,
+            "width": width,
+            "height": height,
+            "duration_s": chosen_duration,
+            "fps": fps,
+            "watermark": chosen_watermark,
+            "tokens": tokens_actual,
+            "tokens_estimated": tokens_estimated,
+            "price_cny_per_mtoken": price_t2v,
+            "cost_cny": round(cost_cny, 4),
+            "elapsed_s": round(elapsed, 1),
+            "cgt_id": cgt_id,
+            "ts": datetime.now().isoformat(timespec="seconds"),
+        }
+        meta_path = dest_mp4.with_suffix(".meta.json")
+        meta_path.write_text(json.dumps(meta, ensure_ascii=False, indent=2), encoding="utf-8")
+
+        try:
+            record_video_usage(
+                task_id=self.task_id,
+                user_id=self.user_id,
+                model_profile=f"doubao.{self.variant_key}",
+                resolution=chosen_resolution,
+                ratio=chosen_ratio,
+                duration_s=chosen_duration,
+                fps=fps,
+                width=width,
+                height=height,
+                tokens=tokens_actual,
+                price_cny_per_mtoken=price_t2v,
+                has_video_input=False,   # phase 1 仅 t2v;i2v 接入后这里读 body 判断
+                watermark=chosen_watermark,
+                extra_units={"cgt_id": cgt_id, "elapsed_s": round(elapsed, 1)},
+            )
+        except Exception as e:
+            print(f"[seedance] record_video_usage failed: {type(e).__name__}: {e}", flush=True)
+
+        disp = self._display(dest_mp4)
+        # banner 协议与 seedream 一致:首行 `[tool] key=value · key=value ...`
+        # 前端 extractMediaBanner 已 whitelist seedance,正则抓 key=value 挂徽章
+        return (
+            f"[seedance] model={model_id} · resolution={chosen_resolution} · ratio={chosen_ratio} · "
+            f"duration={chosen_duration}s · cost=¥{cost_cny:.2f} · elapsed={elapsed:.1f}s\n"
+            f"saved: {disp}\n"
+            f"prompt={prompt!r}\n"
+            f"watermark={chosen_watermark} cgt_id={cgt_id}"
+        )
+
+    @staticmethod
+    def _rough_cost(resolution: str, ratio: str, duration_s: int, fps: int, price_per_mtoken: float) -> float:
+        w, h = _resolve_dimensions(resolution, ratio)
+        tokens = _estimate_tokens(w, h, duration_s, fps)
+        return tokens * price_per_mtoken / 1_000_000.0
+
+    @staticmethod
+    def _extract_task_id(resp: dict) -> str:
+        """submit 响应抽 cgt-xxx task id。容忍几种已知 shape。"""
+        for k in ("id", "task_id", "request_id"):
+            v = resp.get(k)
+            if isinstance(v, str) and v:
+                return v
+        data = resp.get("data")
+        if isinstance(data, dict):
+            for k in ("id", "task_id"):
+                v = data.get(k)
+                if isinstance(v, str) and v:
+                    return v
+        return ""
+
+    @staticmethod
+    def _extract_video_url(resp: dict) -> str:
+        """succeeded 响应抽 video url。已知路径:
+        - {"content": {"video_url": "..."}}(火山方舟标准)
+        - {"data": {"video_url": "..."}}(部分代理)
+        - 兜底:任意位置首个 .url 字符串
+        """
+        for key in ("content", "data", "output"):
+            sub = resp.get(key)
+            if isinstance(sub, dict):
+                v = sub.get("video_url")
+                if isinstance(v, str) and v.startswith("http"):
+                    return v
+                # output[0].content[0].text → json parse(代理路径,极少用到)
+        # 递归兜底
+        def _find_url(o: Any) -> Optional[str]:
+            if isinstance(o, dict):
+                for k, v in o.items():
+                    if k in ("video_url", "url") and isinstance(v, str) and v.startswith("http"):
+                        # 过滤明显非视频的(image_url 等),video_url 优先级最高
+                        if k == "video_url":
+                            return v
+                        if v.lower().endswith((".mp4", ".webm", ".mov")):
+                            return v
+                    r = _find_url(v)
+                    if r:
+                        return r
+            elif isinstance(o, list):
+                for x in o:
+                    r = _find_url(x)
+                    if r:
+                        return r
+            return None
+        return _find_url(resp) or ""
+
+    @staticmethod
+    def _extract_tokens(resp: dict) -> Optional[int]:
+        """响应里若带官方 usage 字段就取,优先 total_tokens > completion_tokens。"""
+        usage = resp.get("usage")
+        if isinstance(usage, dict):
+            for k in ("total_tokens", "completion_tokens", "tokens"):
+                v = usage.get(k)
+                if isinstance(v, (int, float)) and v > 0:
+                    return int(v)
+        return None
--- a/web/app.py
+++ b/web/app.py
@ -255,28 +255,33 @@ def _validate_transfer(
 # ─────────────────── BG run + SSE 帧格式 ───────────────────

 def _run_agent_bg(
-    task_id: UUID, user_id: UUID, user_message: str, image_variant: str = "",
+    task_id: UUID, user_id: UUID, user_message: str,
+    image_variant: str = "", video_variant: str = "",
 ) -> None:
    """工作线程:`build_agent(resume=True)` → 装 WebEventSink + cancel_check → `agent.run` → 写 tasks.run_status。

    sink 通过 broker.emit 桥事件回 asyncio loop;agent.run 是 sync,所以在 to_thread 跑。
    user_id 必须从 JWT 那侧透传过来 —— 决定 memory_block 读哪个 per-user 子树。
    cancel_check 桥 broker.is_cancelled,loop 在 stream chunk 间 + 工具调用之间 poll;
-    cancel 延迟 ~ 单 chunk 间隔(100ms 级)。
+    cancel 延迟 ~ 单 chunk 间隔(100ms 级);seedance 轮询间也读这个 cancel_check 用于
+    用户停止按钮(必须在 build_agent 阶段就传进去,因为 SeedanceTool ctor 持有它,
+    不能像以前那样 build_agent 返回后再赋 agent.cancel_check)。
    `ok / cancelled` 收尾直接回 `idle`(不留持久标记);只有 error 是持久终态。

-    image_variant:本 run 用哪个图像 variant 装 seedream(空 → yaml 第一个)。
+    image_variant / video_variant:本 run 用哪个 image/video variant 装 tool(空 → yaml 第一个)。
    随消息 POST 传进来,不入 DB —— UI 下拉的选择就跟在这一条消息上生效。
    """
    from core.agent_builder import build_agent, sync_task_tokens
+    cancel_check = lambda tid=task_id: broker.is_cancelled(tid)
    try:
        broker.emit(task_id, {"type": "run_start"})
        agent, session, sid, task_state, task_dir = build_agent(
            session_id=str(task_id), resume=True, user_id=user_id,
            image_variant=image_variant,
+            video_variant=video_variant,
+            cancel_check=cancel_check,
        )
        agent.sink = WebEventSink(broker, task_id)
-        agent.cancel_check = lambda tid=task_id: broker.is_cancelled(tid)
        agent.run(user_message)
        sync_task_tokens(task_state)
        # cancel 命中或正常完成 → run_status 回 idle(error 才持久)
@ -365,6 +370,36 @@ def _resolve_image_model(variant: str) -> str:
    return name


+def _list_video_variants() -> list[tuple[str, dict]]:
+    """扫 config/media/doubao.yaml video 段 → [(variant_key, variant_cfg), ...]。
+
+    与 _list_image_variants 同范式;空 video 段(未上线 / 注释掉)→ 返 [],UI 隐藏下拉。
+    """
+    from core.paths import ROOT
+    import yaml as _yaml
+
+    p = ROOT / "config" / "media" / "doubao.yaml"
+    if not p.exists():
+        return []
+    try:
+        data = _yaml.safe_load(p.read_text(encoding="utf-8")) or {}
+    except Exception:
+        return []
+    video_cfg = data.get("video") or {}
+    return [(k, v) for k, v in video_cfg.items() if isinstance(v, dict)]
+
+
+def _resolve_video_model(variant: str) -> str:
+    """校验 video_model variant key(同 _resolve_image_model 范式)。"""
+    name = (variant or "").strip()
+    if not name:
+        return ""
+    variants = {k for k, _ in _list_video_variants()}
+    if name not in variants:
+        raise HTTPException(400, f"invalid video_model {name!r}; available: {sorted(variants)}")
+    return name
+
+
 # ────────────────────── Pydantic 请求体 ──────────────────────

 class TaskCreateRequest(BaseModel):
@ -385,18 +420,19 @@ class TaskPatchRequest(BaseModel):

 class MessageRequest(BaseModel):
    content: str
-    # 该条消息触发的生图模型 variant key(config/media/doubao.yaml image 段)。
-    # 空 → seedream tool 走 yaml 第一个 variant(目前 seedream_5);非空 → 本次 run
-    # 用此 variant 装配 SeedreamTool。仅作用于本 run,不入 DB,UI 下拉的选择跟在
-    # 消息 POST body 上。
+    # 该条消息触发的生图 / 生视频模型 variant key(config/media/doubao.yaml image/video 段)。
+    # 空 → 对应 tool 走 yaml 第一个 variant;非空 → 本次 run 装配指定 variant。
+    # 仅作用于本 run,不入 DB,UI 下拉的选择跟在消息 POST body 上。
    image_model: str = ""
+    video_model: str = ""


 class OptimizePromptRequest(BaseModel):
    text: str
-    # 选择性传当前 UI 选中的生图 variant key,润色 meta-prompt 会把对应模型特性塞进去
-    # (让 LLM 知道下游 tool 偏好,润色出更贴合 seedream/seedance 等的 prompt)。
+    # 选择性传当前 UI 选中的 variant key,润色 meta-prompt 会把对应模型特性塞进去
+    # (让 LLM 知道下游 tool 偏好,润色出更贴合 seedream / seedance 等的 prompt)。
    image_model: str = ""
+    video_model: str = ""


 class FileDeleteRequest(BaseModel):
@ -542,7 +578,7 @@ def create_app() -> FastAPI:

        前端顶栏第二个下拉拉这个;空列表 → 没配 image variant,UI 隐藏下拉。
        `is_default` 标第一个 variant(=agent_builder fallback 目标)。开发期不缓存,
-        改 YAML 加新 variant(如 seedance)立即生效。
+        改 YAML 加新 variant 立即生效。
        """
        variants = _list_image_variants()
        out: list[dict] = []
@ -556,6 +592,29 @@ def create_app() -> FastAPI:
            })
        return {"models": out}

+    @app.get("/v1/video_models", tags=["misc"])
+    def list_video_models(user_id: UUID = Depends(require_user)):
+        """视频生成模型清单(扫 config/media/doubao.yaml video 段)。
+
+        与 /v1/image_models 同范式;空列表 → UI 隐藏第三下拉。展示信息包括默认分辨率
+        与 token 单价(¥/Mtok 文生视频路径),方便用户在下拉选项里直接看到 cost 量级。
+        """
+        variants = _list_video_variants()
+        out: list[dict] = []
+        for i, (key, cfg) in enumerate(variants):
+            out.append({
+                "variant": key,
+                "display_name": cfg.get("display_name") or key,
+                "model_id": cfg.get("model_id") or "",
+                "default_resolution": cfg.get("default_resolution"),
+                "default_duration": cfg.get("default_duration"),
+                "default_ratio": cfg.get("default_ratio"),
+                "price_cny_per_mtoken_text2video": cfg.get("price_cny_per_mtoken_text2video"),
+                "price_cny_per_mtoken_video2video": cfg.get("price_cny_per_mtoken_video2video"),
+                "is_default": i == 0,
+            })
+        return {"models": out}
+
    # ───────────── Auth ─────────────

    @app.post("/v1/auth/login", tags=["auth"])
@ -1003,12 +1062,13 @@ def create_app() -> FastAPI:
                    run_status="running", run_error=None,
                )
            )
-        # image_model 在 POST 时校验,避免 BG 线程里抛在 sink 之外难追;空串透传不查 yaml。
+        # image_model / video_model 在 POST 时校验,避免 BG 线程里抛在 sink 之外难追;空串透传不查 yaml。
        image_variant = _resolve_image_model(body.image_model)
+        video_variant = _resolve_video_model(body.video_model)
        broker.start(tid)  # 清上一轮 done 标记,新订阅者才能看到流式
        # commit 后 lock 释放;BG 线程接管(sink 通过 broker 把 event 桥回 asyncio loop)
        asyncio.create_task(asyncio.to_thread(
-            _run_agent_bg, tid, user_id, content, image_variant,
+            _run_agent_bg, tid, user_id, content, image_variant, video_variant,
        ))
        return {"events_url": f"/v1/tasks/{tid}/events"}

@ -1149,7 +1209,7 @@ def create_app() -> FastAPI:
        except (FileNotFoundError, ValueError) as e:
            raise HTTPException(500, f"invalid task model_profile {chosen_profile!r}: {e}")

-        # 收集下游 tool 上下文:对话模型 display_name + 当前选中生图 variant 元数据
+        # 收集下游 tool 上下文:对话模型 display_name + 当前选中 image/video variant 元数据
        chat_model_display = caps.display_name or chosen_profile
        image_variant_hint = ""
        img_variant = (body.image_model or "").strip()
@ -1163,10 +1223,23 @@ def create_app() -> FastAPI:
                        f"擅长写实/插画/构图描述)。若用户意图涉及画面/封面/插图,"
                        f"润色后的文本要给出适合该模型的画面细节(主体/风格/光线/构图)。"
                    )
+        video_variant_hint = ""
+        vid_variant = (body.video_model or "").strip()
+        if vid_variant:
+            for k, v in _list_video_variants():
+                if k == vid_variant:
+                    name = v.get("display_name") or k
+                    res = v.get("default_resolution") or "720p"
+                    dur = v.get("default_duration") or 5
+                    video_variant_hint = (
+                        f"\n下游生视频工具:{name}(默认 {res} / {dur}s / 16:9)。"
+                        f"若用户意图涉及视频/动画/动起来,润色后的文本要补全 "
+                        f"主体在做什么(运动)+ 镜头怎么动 + 场景 + 风格,而非静态画面描述。"
+                    )

        meta_prompt = (
            f"你的任务是润色用户输入的草稿,使之成为一个清晰、完整、可执行的 prompt。\n"
-            f"当前对话模型:{chat_model_display}。{image_variant_hint}\n\n"
+            f"当前对话模型:{chat_model_display}。{image_variant_hint}{video_variant_hint}\n\n"
            f"规则:\n"
            f"1. 只输出润色后的文本本身,不要任何解释、前后缀、引号、markdown 代码块包裹\n"
            f"2. 保留用户原始语言(中文/英文)\n"
@ -1215,6 +1288,7 @@ def create_app() -> FastAPI:
                        "tokens_out": completion_tokens,
                        "usd_to_cny": float(USD_TO_CNY),
                        "image_model_hint": img_variant or "",
+                        "video_model_hint": vid_variant or "",
                    },
                    cost_cny=cost_cny,
                ))
--- a/web/static/dev.html
+++ b/web/static/dev.html
@ -785,6 +785,9 @@ const state = {
  // 当前选中的图像生成 variant key(per-session,不入 DB);默认 = imageModels[0].variant
  // (=yaml 第一个 = agent_builder fallback)。下次 send 消息时随 POST body 带给 backend。
  imageModel: "",
+  // 视频生成模型清单(GET /v1/video_models)+ 当前选中 variant。同 imageModels 范式。
+  videoModels: [],
+  videoModel: "",
  // 润色按钮进行中标记:防止双击,同时让 syncOptimizeBtn 在 in-flight 期间不覆盖
  // disabled 状态(否则用户键入 input 会把按钮从"润色中"误启回 enabled)
  optimizing: false,
@ -1138,6 +1141,17 @@ async function loadModels() {
    state.imageModels = [];
    state.imageModel = "";
  }
+  try {
+    const data = await api("GET", "/v1/video_models");
+    state.videoModels = data.models || [];
+    if (!state.videoModel) {
+      const def = state.videoModels.find(m => m.is_default) || state.videoModels[0];
+      state.videoModel = def ? def.variant : "";
+    }
+  } catch (e) {
+    state.videoModels = [];
+    state.videoModel = "";
+  }
 }

 // loadTaskList:默认 reset(filters/refresh/写操作后),append=true 由 sentinel observer 触发
@ -1380,11 +1394,14 @@ function renderChatMeta() {
    <span class="spacer"></span>
    ${renderModelDropdown(t)}
    ${renderImageModelDropdown()}
+    ${renderVideoModelDropdown()}
  `;
  const sel = $("chat-model-sel");
  if (sel) sel.onchange = onChangeModel;
  const imgSel = $("chat-image-model-sel");
  if (imgSel) imgSel.onchange = onChangeImageModel;
+  const vidSel = $("chat-video-model-sel");
+  if (vidSel) vidSel.onchange = onChangeVideoModel;
  const active = t.status === "active";
  $("chat-form").style.display = active ? "flex" : "none";
  syncOptimizeBtn();
@ -1425,6 +1442,22 @@ function onChangeImageModel(ev) {
  $("chat-hint").textContent = `生图模型 → ${ev.target.options[ev.target.selectedIndex].text}`;
 }

+function renderVideoModelDropdown() {
+  // 同 renderImageModelDropdown:videoModels 为空 → 不画。yaml 无 video 段 / 后端
+  // /v1/video_models 返空时下拉不出现,seedance tool 也不会在 schema 里。
+  if (!state.videoModels || state.videoModels.length === 0) return "";
+  const cur = state.videoModel || "";
+  const opts = state.videoModels.map(m =>
+    `<option value="${escapeHtml(m.variant)}" ${m.variant === cur ? "selected" : ""}>${escapeHtml(m.display_name)}</option>`
+  ).join("");
+  return `<span class="muted small" style="display:inline-flex;align-items:center;gap:4px;">生视频 <select id="chat-video-model-sel" class="small" style="width:auto;padding:1px 4px;font-size:12px;" title="下一条消息触发生视频时使用的模型(本地选择,不入库)">${opts}</select></span>`;
+}
+
+function onChangeVideoModel(ev) {
+  state.videoModel = ev.target.value || "";
+  $("chat-hint").textContent = `生视频模型 → ${ev.target.options[ev.target.selectedIndex].text}`;
+}
+
 async function onChangeModel(ev) {
  const sel = ev.target;
  const newProfile = sel.value;
@ -1607,6 +1640,7 @@ async function optimizePrompt() {
    const r = await api("POST", `/v1/tasks/${state.taskId}/optimize_prompt`, {
      text: ta.value,  // 不 trim — 后端再 strip;保留尾部 newline 让用户感受不变
      image_model: state.imageModel || "",
+      video_model: state.videoModel || "",
    });
    const optimized = (r.optimized || "").trim();
    if (!optimized) throw new Error("空结果");
@ -1677,6 +1711,7 @@ async function sendMessage() {
    const r = await api("POST", `/v1/tasks/${state.taskId}/messages`, {
      content,
      image_model: state.imageModel || "",
+      video_model: state.videoModel || "",
    });
    $("chat-input").value = "";
    syncOptimizeBtn();