feat: 接入豆包 Seedance 2.0 Fast 视频生成 (文生视频) + videogen skill
- tools/seedance.py: 异步 submit /contents/generations/tasks → 5s 轮询 → succeeded 后 download mp4 + meta.json 落 <wd>/videos/;失败/cancel 不计费;cancel_check 在 轮询间检查,响应用户停止按钮 - config/media/doubao.yaml: 展开 video.seedance_2_fast (¥37/Mtok 文生 / ¥22/Mtok 图生,token 公式校验 720p 5s = ¥4.00 完全对上源数据) - core/storage/usage.py: record_video_usage,kind=video,units jsonb snapshot resolution/duration/ratio/fps/tokens/单价 - core/agent_builder.py: build_agent 加 video_variant + cancel_check 形参, cancel_check 必须 build 阶段传 (SeedanceTool ctor 持有用于轮询) - web/app.py: GET /v1/video_models + MessageRequest.video_model + 透传 - web/static/dev.html: 顶栏第三下拉 (image 旁边) + state.videoModels/videoModel - skills/videogen/SKILL.md: 六维诊断 (运动+镜头 替代 imagegen 的光线);BLOCKING 门槛比 imagegen 更严 (¥4 vs ¥0.22) + 等 30-90s 出片 - prompts/system/general_v1.md: 加 seedance 触发指引 (平行 seedream) phase 1 仅 t2v 文生视频,fast 上限 720p。API 端到端 smoke 跑过:路径/auth/错误解析 全通,body schema 待用户在火山方舟控制台开通模型后真出片才能验。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
9a26e85da2
commit
7ff58c488e
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。
|
||||
|
||||
最后更新:2026-05-21(sandbox 阻塞地位写进 DESIGN + 黑名单不再加强)
|
||||
最后更新:2026-05-22(豆包 Seedance 2.0 Fast 视频生成接入 + videogen skill)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -10,7 +10,7 @@
|
|||
|
||||
| Phase | 标题 | 状态 | 备注 |
|
||||
|---|---|---|---|
|
||||
| 1-3 | 骨架 + Skill + run_python | ✅ | 三个 skill;CoreCoder 唯一匹配 edit;敏感 env 过滤 |
|
||||
| 1-3 | 骨架 + Skill + run_python | ✅ | 多 skill(coding/proposal/ppt/research/documents/imagegen/videogen);CoreCoder 唯一匹配 edit;敏感 env 过滤 |
|
||||
| 4 | 演化性能力 | 🟡 | Model Profile + Probing ✅;版本化 prompt 未做 |
|
||||
| 5 | Eval Suite | ⏸ 不做 | dogfooding 替代,probe 覆盖健康检查 |
|
||||
| 6 | 长任务工程化 | 🟡 | task + 恢复 ✅;双层记忆 ✅;context 压缩未做 |
|
||||
|
|
@ -21,6 +21,10 @@
|
|||
|
||||
## 已完成关键能力
|
||||
|
||||
### 2026-05-22
|
||||
|
||||
- **豆包 Seedance 2.0 Fast 视频生成接入(文生视频)+ videogen skill**:`config/media/doubao.yaml` 展开 video 段(`seedance_2_fast`:¥37/Mtok 文生 / ¥22/Mtok 图生,实测档位 480p 5s ¥1.86 / 720p 5s ¥4.00 — 由 token 公式 `(in+out)×W×H×fps/1024` 反推校验通过);`tools/seedance.py` 走 ark POST `/contents/generations/tasks` → 5s 间隔轮询 → succeeded 后 download mp4 + .meta.json 落 `<wd>/videos/<ts>-<rand>.mp4`,失败/cancel 不计费;`core/storage/usage.py::record_video_usage` 多态 units snapshot(resolution/duration/ratio/fps/tokens/单价);`build_agent` 加 `video_variant` + `cancel_check` 形参 — cancel_check 必须在 build 阶段传(SeedanceTool ctor 持有用于轮询期间响应停止按钮,改了原"build 后赋 agent.cancel_check"的延迟绑定,web 入口同步迁移);`web/app.py` 加 `_list_video_variants` / `_resolve_video_model` / `GET /v1/video_models` / `MessageRequest.video_model` / `OptimizePromptRequest.video_model`;`dev.html` 顶栏第三下拉 + `state.videoModels/videoModel` + 发消息一起 POST。前端 chip / inline `<video>` / `extractMediaBanner` / `_categorize` 在前期工作里已为 seedance 留好脚手架,几乎不动。`skills/videogen/SKILL.md` 六维诊断把 imagegen 的"光线"换成"运动+镜头"两维(运动必填,否则应该走 seedream 而非 seedance —— 差 18 倍价钱);BLOCKING 门槛比 imagegen 更严(¥4 vs ¥0.22)且要等 30-90s,贴 prompt+参数+预计花费+预计等待四件套等明确确认。`general_v1.md` 加 seedance 触发指引(平行 seedream)。phase 1 仅 t2v,**不支持 i2v**(skill 明示告诉用户)。fast 上限 720p,1080p+ 留给 pro variant(yaml 当前未配)。否决:(a) progress 事件流化(需要给 tool 加 sink 注入,phase 1 用 `run_status=running` 够了);(b) 远端 cgt-task DELETE(Volcengine 无明确 API,best-effort 不动);(c) i2v phase 1 拉进来(要图片转 URL + UI 选已有图,延后)。
|
||||
|
||||
### 2026-05-21
|
||||
|
||||
- **dev.html primary button hover 文字消失修复(`.primary:hover` 加 `background: var(--accent)`)**:`button:hover:not(:disabled)` 与 `button.primary:hover` 特异性同为 (0,2,1) 平手按源码序后者赢,但后者只声明了 `filter: brightness(1.08)` 没声明 `background`,导致 `background` fallback 到前者的 `var(--hover)` 浅灰,而 `color` 仍是 `.primary` 的白 —— 白字浅灰底视觉消失。修法守住 background = accent,brightness filter 在红底上正常提亮。"+ 新建任务" / "发送" 两个 primary 按钮 hover 体验回归。
|
||||
|
|
|
|||
5
RUN.md
5
RUN.md
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
> 怎么把 zcbot 跑起来。env / 常用命令 / 故障兜底。设计看 `DESIGN.md`,进度看 `PROGRESS.md`。
|
||||
|
||||
最后更新:2026-05-21(新增 documents skill 的 env:`DOCUMENT_SEARCH_API_KEY` / `DOCUMENT_SEARCH_URL`)
|
||||
最后更新:2026-05-22(seedance 视频生成接入 — 同 ARK_API_KEY,新增 videogen skill)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -14,7 +14,8 @@
|
|||
DEEPSEEK_API_KEY=sk-...
|
||||
# 用 GLM 的话再加一条;国际站 z.ai 用 ZAI_API_KEY,国内站 bigmodel.cn 用 ZHIPUAI_API_KEY(对应 config/models/glm.yaml 的 api_key_env 字段)
|
||||
ZHIPUAI_API_KEY=...
|
||||
# 豆包(火山方舟)图像/视频生成:可选。设了就挂上 seedream tool(0.22 元/张);未设 tool 不出现
|
||||
# 豆包(火山方舟)图像/视频生成:可选。设了同时挂 seedream tool(0.22 元/张)与 seedance tool
|
||||
# (Seedance 2.0 Fast,文生视频,480p 4s ¥1.86 ~ 720p 15s ¥12+,异步等 30-90s);未设两个 tool 都不出现
|
||||
ARK_API_KEY=...
|
||||
# documents skill(内部知识库 document_search API):可选。设了 documents skill 才能用,未设调用立即抛 RuntimeError
|
||||
DOCUMENT_SEARCH_API_KEY=...
|
||||
|
|
|
|||
|
|
@ -1,12 +1,14 @@
|
|||
# 豆包(火山方舟 Ark)媒体生成模型档案。
|
||||
#
|
||||
# 价格表 last_updated: 2026-05-20
|
||||
# 价格表 last_updated: 2026-05-22
|
||||
# 源: https://www.volcengine.com/docs/82379/1544106
|
||||
# https://github.com/ArcReel/ArcReel(费用参考表)
|
||||
# 豆包调价时手动更新本文件 + 重启 web。历史 usage_events 自带 snapshot 不受影响
|
||||
# (record_image_usage 把 price_cny_per_image 写进 units jsonb 列)。
|
||||
# (record_image_usage / record_video_usage 把单价 snapshot 写进 units jsonb 列)。
|
||||
#
|
||||
# 接入方式:走 ark 原生 HTTP(litellm 不覆盖图像/视频),core/ark_client.py 封装统一调用。
|
||||
# image (seedream) 同步返 URL;video (seedance) 异步 task + polling — 本期仅落地 image。
|
||||
# image (seedream) 同步返 URL;video (seedance) 异步 task + polling,产物落
|
||||
# <wd>/videos/<ts>-<rand>.mp4。
|
||||
|
||||
ark_api_key_env: ARK_API_KEY
|
||||
ark_base_url: https://ark.cn-beijing.volces.com/api/v3
|
||||
|
|
@ -23,11 +25,35 @@ image:
|
|||
default_search: false # web search 额外加价 ~¥0.05/张;默认关
|
||||
request_timeout_s: 60 # 出图慢于此判超时
|
||||
|
||||
# video (seedance) 待 Phase 2:
|
||||
# video:
|
||||
# seedance_2:
|
||||
# model_id: doubao-seedance-2-0-260128
|
||||
# ...
|
||||
# seedance_2_fast:
|
||||
# model_id: doubao-seedance-2-0-fast-260128
|
||||
# ...
|
||||
video:
|
||||
seedance_2_fast:
|
||||
model_id: doubao-seedance-2-0-fast-260128
|
||||
display_name: 豆包 Seedance 2.0 Fast
|
||||
# 异步任务:POST /contents/generations/tasks 拿 cgt-xxx → 轮询 GET
|
||||
# /contents/generations/tasks/<id> 直到 status=succeeded → 取 content.video_url
|
||||
# (24h 内有效,本 tool 立刻 download 到本地)。
|
||||
endpoint_submit: /contents/generations/tasks
|
||||
endpoint_poll: /contents/generations/tasks # 实际路径 = base + "/{cgt_id}"
|
||||
|
||||
# 计费(per-token,token = (in_dur+out_dur) × W × H × fps / 1024):
|
||||
# 文生视频(无视频输入,本期主力路径): ¥37 / 百万 tokens
|
||||
# 图生视频(有视频输入,phase 2): ¥22 / 百万 tokens
|
||||
# 实测档位(fast, 5s, 文生视频, 24fps,源: ArcReel 费用参考表):
|
||||
# 480p 16:9 → ¥1.86
|
||||
# 720p 16:9 → ¥4.00
|
||||
# tool 内部按 W×H×duration×fps/1024 估算 tokens × 单价 → cost_cny。响应里若带 usage
|
||||
# 字段则覆盖估算(待豆包接口实际返回字段校准)。
|
||||
price_cny_per_mtoken_text2video: 37.0
|
||||
price_cny_per_mtoken_video2video: 22.0
|
||||
fps: 24 # token 估算用;豆包当前 24fps 固定
|
||||
|
||||
# 支持参数(POST body 字段)
|
||||
default_resolution: 720p # fast 上限,可选 480p / 720p
|
||||
default_ratio: "16:9" # 16:9 / 9:16 / 1:1 / 4:3 / 3:4 / 21:9 / adaptive
|
||||
default_duration: 5 # 4-15s
|
||||
default_watermark: false
|
||||
|
||||
# 轮询参数
|
||||
request_timeout_s: 60 # submit POST 超时(异步,只是提交)
|
||||
poll_interval_s: 5 # 单次 GET 间隔(秒);典型 30-90s 出片
|
||||
poll_timeout_s: 600 # 总等待上限(10min)→ 超时返 [Error]
|
||||
|
|
|
|||
|
|
@ -19,7 +19,7 @@ from __future__ import annotations
|
|||
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Optional, Tuple
|
||||
from typing import Callable, Optional, Tuple
|
||||
from uuid import UUID, uuid4
|
||||
|
||||
import yaml
|
||||
|
|
@ -37,6 +37,7 @@ from core.storage import check_no_subtask
|
|||
from core.task import TaskState
|
||||
from tools.fs import EditTool, GlobTool, GrepTool, ReadTool, WriteTool
|
||||
from tools.run_python import RunPythonTool
|
||||
from tools.seedance import SeedanceTool
|
||||
from tools.seedream import SeedreamTool
|
||||
from tools.shell import ShellTool
|
||||
from tools.skill_tool import LoadSkillTool
|
||||
|
|
@ -220,6 +221,8 @@ def build_agent(
|
|||
name: Optional[str] = None,
|
||||
working_dir: Optional[str] = None,
|
||||
image_variant: str = "",
|
||||
video_variant: str = "",
|
||||
cancel_check: Optional[Callable[[], bool]] = None,
|
||||
) -> Tuple[AgentLoop, Session, str, TaskState, Path]:
|
||||
"""返回 (agent, session, task_id_str, task_state, working_dir_path)。
|
||||
|
||||
|
|
@ -384,8 +387,39 @@ def build_agent(
|
|||
)
|
||||
tools[seedream_tool.name] = seedream_tool
|
||||
|
||||
# 视频 variant 选择(同上 image_variant 范式):video_variant 由 caller 传,
|
||||
# 空 → 取 yaml 第一个 video variant。本 run 的 SeedanceTool 锁定该 variant。
|
||||
# cancel_check 是 web 入口构造的 `lambda: broker.is_cancelled(task_id)` —— 轮询
|
||||
# 期间(典型 30-90s)拿来响应用户停止按钮;远端 cgt 任务无 cancel API,best-effort 不动远端
|
||||
video_cfg = (ark_cfg.raw.get("video") or {})
|
||||
v_chosen_key, v_chosen_cfg = "", None
|
||||
if video_variant:
|
||||
v = video_cfg.get(video_variant)
|
||||
if isinstance(v, dict):
|
||||
v_chosen_key, v_chosen_cfg = video_variant, v
|
||||
if v_chosen_cfg is None:
|
||||
for variant_key, variant_cfg in video_cfg.items():
|
||||
if isinstance(variant_cfg, dict):
|
||||
v_chosen_key, v_chosen_cfg = variant_key, variant_cfg
|
||||
break
|
||||
if v_chosen_cfg is not None:
|
||||
seedance_tool = SeedanceTool(
|
||||
ark_cfg=ark_cfg,
|
||||
video_variant_cfg=v_chosen_cfg,
|
||||
variant_key=v_chosen_key,
|
||||
working_dir=working_dir_path,
|
||||
task_id=task_id,
|
||||
user_id=uid,
|
||||
base_dir=tool_base,
|
||||
user_root=ur_path,
|
||||
cancel_check=cancel_check,
|
||||
)
|
||||
tools[seedance_tool.name] = seedance_tool
|
||||
|
||||
sink = ConsoleEventSink(console) if console else None
|
||||
agent = AgentLoop(llm, tools, session, caps, user_id=uid, sink=sink)
|
||||
if cancel_check is not None:
|
||||
agent.cancel_check = cancel_check
|
||||
return agent, session, sid, task_state, working_dir_path
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -130,3 +130,63 @@ def record_image_usage(
|
|||
cost_cny=cost_cny,
|
||||
))
|
||||
return cost_cny
|
||||
|
||||
|
||||
def record_video_usage(
|
||||
*,
|
||||
task_id: UUID,
|
||||
user_id: UUID,
|
||||
model_profile: str,
|
||||
resolution: str,
|
||||
ratio: str,
|
||||
duration_s: int,
|
||||
fps: int,
|
||||
width: int,
|
||||
height: int,
|
||||
tokens: int,
|
||||
price_cny_per_mtoken: float,
|
||||
has_video_input: bool = False,
|
||||
watermark: bool = False,
|
||||
extra_units: Optional[Mapping[str, Any]] = None,
|
||||
) -> Decimal:
|
||||
"""记一次视频生成:写 usage_events(kind=video)。
|
||||
|
||||
成本算法:`cost_cny = tokens / 1_000_000 * price_cny_per_mtoken`。tokens 由 caller
|
||||
传入(响应里若有官方 usage 字段就用,否则按公式 `(in_dur+out_dur)*W*H*fps/1024`
|
||||
估算),`price_cny_per_mtoken` 同步 snapshot 进 units jsonb,日后调价不影响历史对账。
|
||||
|
||||
`model_profile` 形如 `"doubao.seedance_2_fast"`(family.variant 风格)。
|
||||
`has_video_input=True` 表示图生/视频编辑路径(单价 22 元/Mtok);False = 文生视频(37 元/Mtok)。
|
||||
caller 自行根据请求 body 是否含 image_url/video_url 传对应单价 + flag。
|
||||
|
||||
**失败任务不要走这里** —— Volcengine 失败不计费,失败的 tool 调用直接返 [Error] 不写 usage。
|
||||
"""
|
||||
price = Decimal(str(price_cny_per_mtoken))
|
||||
tok = Decimal(str(int(tokens)))
|
||||
cost_cny = (price * tok / Decimal("1000000")).quantize(Decimal("0.000001"))
|
||||
units: dict[str, Any] = {
|
||||
"resolution": resolution,
|
||||
"ratio": ratio,
|
||||
"duration_s": int(duration_s),
|
||||
"fps": int(fps),
|
||||
"width": int(width),
|
||||
"height": int(height),
|
||||
"tokens": int(tokens),
|
||||
"price_cny_per_mtoken": float(price_cny_per_mtoken),
|
||||
"has_video_input": bool(has_video_input),
|
||||
"watermark": bool(watermark),
|
||||
}
|
||||
if extra_units:
|
||||
units.update(extra_units)
|
||||
|
||||
with session_scope() as s:
|
||||
s.add(UsageEvent(
|
||||
user_id=user_id,
|
||||
task_id=task_id,
|
||||
message_id=None,
|
||||
kind="video",
|
||||
model_profile=model_profile,
|
||||
units=units,
|
||||
cost_cny=cost_cny,
|
||||
))
|
||||
return cost_cny
|
||||
|
|
|
|||
|
|
@ -11,6 +11,9 @@
|
|||
- `seedream` —— 豆包图像生成。产物自动落 `<task_dir>/figures/`。每次 **¥0.22**(联网 `search=true` 加 ¥0.05)。
|
||||
- **调用前必须先 `load_skill('imagegen')`** —— skill 里有「何时该用 / 该不该用 mermaid 替代 / 用户描述模糊度诊断 / 一次性追问范式 / prompt 装配 / 失败解药」全套引导。**不要拿用户原话直接当 prompt 调 tool** —— 容易烧 ¥0.22 在错的方向上。
|
||||
- 兜底硬约束(即使没 load skill 也守):用户没主动要图就别装饰性生成;同一目的不满意**不要连发**,先口头校准 prompt 再调。
|
||||
- `seedance` —— 豆包视频生成(Seedance 2.0 Fast)。异步任务,**等 30-90s 出片**;产物自动落 `<task_dir>/videos/`。每次 **¥1.86 起**(480p 4s)~ **¥12+**(720p 15s),比图贵 10 倍以上。触发词:视频 / 动画 / 动起来 / 做个 video / 镜头 / 短片 / 演示视频 / 动效。
|
||||
- **调用前必须先 `load_skill('videogen')`** —— skill 里有「6 维诊断(含运动维必填)/ seedream/mermaid 反向选型 / prompt 装配 / 参数取舍(时长/分辨率/比例直接决定钱)/ 失败解药」全套引导。视频比图贵 10 倍且 90s 等待,绝对不要拿用户原话当 prompt 直接调。
|
||||
- 兜底硬约束:用户没主动要视频就别装饰性生成(比生图更严重的红线);同一目的不满意**绝不连发**(1 次错 = ¥4+60s,连发 2 次 = ¥8+2min);phase 1 仅文生视频,**不支持** image-to-video / video-to-video。
|
||||
|
||||
## Skill 机制
|
||||
你启动时只看到下方 skill 的"名字 + 描述"。Skill 是**可选辅助** —— 任务明确落在
|
||||
|
|
|
|||
|
|
@ -0,0 +1,121 @@
|
|||
"""Smoke: 豆包 Seedance 视频生成 tool 端到端走通。
|
||||
|
||||
跑法: .venv/Scripts/python.exe scripts/smoke_seedance.py
|
||||
依赖 .env 里 ARK_API_KEY / ZCBOT_DB_URL。**会真的调豆包 API,产生 ~¥1.86 (480p 5s) 费用 + 等 30-90s**。
|
||||
|
||||
校验:
|
||||
1. ArkConfig.load() 拿到 cfg + video.seedance_2_fast 存在
|
||||
2. SeedanceTool.execute(prompt=..., resolution='480p', duration=4) 返回 [seedance ...] 文案
|
||||
3. videos/<ts>-<rand>.mp4 落盘且大于 0 字节
|
||||
4. 同名 .meta.json 存在 + 含 prompt/model_id/cost_cny/tokens/cgt_id 字段
|
||||
5. usage_events 多出一行 kind="video",单价 + 分辨率 + 时长 snapshot 在 units jsonb
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import uuid
|
||||
from pathlib import Path
|
||||
|
||||
ROOT = Path(__file__).resolve().parent.parent
|
||||
sys.path.insert(0, str(ROOT))
|
||||
|
||||
env_file = ROOT / ".env"
|
||||
if env_file.exists():
|
||||
for line in env_file.read_text(encoding="utf-8").splitlines():
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#") or "=" not in line:
|
||||
continue
|
||||
k, _, v = line.partition("=")
|
||||
os.environ.setdefault(k.strip(), v.strip())
|
||||
|
||||
from sqlalchemy import text
|
||||
|
||||
from core.ark_client import ArkConfig
|
||||
from core.storage import session_scope
|
||||
from core.storage.models import Task, User
|
||||
from tools.seedance import SeedanceTool
|
||||
|
||||
|
||||
def main() -> int:
|
||||
cfg = ArkConfig.load()
|
||||
if cfg is None:
|
||||
print("[SKIP] ARK_API_KEY 未设(或 config/media/doubao.yaml 缺失),无法测真接口")
|
||||
return 0
|
||||
|
||||
video_cfg = (cfg.raw.get("video") or {})
|
||||
if not video_cfg:
|
||||
print("[SKIP] doubao.yaml 无 video 段")
|
||||
return 0
|
||||
variant_key, variant_cfg = next(iter(video_cfg.items()))
|
||||
print(f"[setup] variant={variant_key} model={variant_cfg.get('model_id')}")
|
||||
print(f" price_text2video=¥{variant_cfg.get('price_cny_per_mtoken_text2video')}/Mtok")
|
||||
|
||||
uid = uuid.uuid4()
|
||||
tid = uuid.uuid4()
|
||||
ws_user = ROOT / "workspace" / "users" / str(uid)
|
||||
wd = ws_user / "smoke_seedance"
|
||||
wd.mkdir(parents=True, exist_ok=True)
|
||||
# 拆两个事务:models 未定 relationship,UOW 不知 User→Task FK 依赖,同一事务里 Task 会先插炸 FK
|
||||
with session_scope() as s:
|
||||
s.add(User(user_id=uid))
|
||||
with session_scope() as s:
|
||||
s.add(Task(task_id=tid, user_id=uid, name="smoke_seedance", working_dir=str(wd)))
|
||||
|
||||
tool = SeedanceTool(
|
||||
ark_cfg=cfg,
|
||||
video_variant_cfg=variant_cfg,
|
||||
variant_key=variant_key,
|
||||
working_dir=wd,
|
||||
task_id=tid,
|
||||
user_id=uid,
|
||||
base_dir=wd,
|
||||
user_root=ws_user,
|
||||
)
|
||||
|
||||
# 选最便宜档位省钱:480p / 4s / 16:9 → ~¥1.5
|
||||
prompt = "一只橙色的小猫从窗台跳下来,水彩风格,镜头平视跟随"
|
||||
print(f"[call] prompt={prompt!r} resolution=480p duration=4")
|
||||
print(f" (异步任务,等 30-90s 出片,先 submit 后轮询)")
|
||||
result = tool.execute(prompt=prompt, resolution="480p", duration=4, ratio="16:9")
|
||||
print(f"[tool result]\n{result}\n")
|
||||
|
||||
if result.startswith("[Error]") or result.startswith("[Cancelled]"):
|
||||
print(f"[FAIL] tool 返回 error/cancelled")
|
||||
return 2
|
||||
|
||||
mp4s = list((wd / "videos").glob("*.mp4"))
|
||||
assert len(mp4s) == 1, f"videos/*.mp4 应当 1 个,实际 {len(mp4s)}"
|
||||
mp4 = mp4s[0]
|
||||
assert mp4.stat().st_size > 0, f"{mp4} 大小为 0"
|
||||
print(f"[OK] mp4 落盘 {mp4.name} ({mp4.stat().st_size} bytes)")
|
||||
|
||||
meta_path = mp4.with_suffix(".meta.json")
|
||||
assert meta_path.exists(), f"meta 文件不存在 {meta_path}"
|
||||
meta = json.loads(meta_path.read_text(encoding="utf-8"))
|
||||
for k in ("prompt", "model_id", "resolution", "duration_s", "tokens", "cost_cny", "cgt_id", "ts"):
|
||||
assert k in meta, f"meta 缺字段 {k}"
|
||||
print(f"[OK] meta 字段齐全: {list(meta.keys())}")
|
||||
print(f" tokens={meta['tokens']} cost=¥{meta['cost_cny']} cgt={meta['cgt_id']}")
|
||||
|
||||
with session_scope() as s:
|
||||
rows = s.execute(text(
|
||||
"SELECT kind, model_profile, units, cost_cny FROM usage_events "
|
||||
"WHERE task_id = :tid"
|
||||
), {"tid": str(tid)}).all()
|
||||
assert len(rows) == 1, f"usage_events 行数应 1,实际 {len(rows)}"
|
||||
row = rows[0]
|
||||
assert row.kind == "video", f"kind 应 video,实际 {row.kind}"
|
||||
assert row.model_profile == f"doubao.{variant_key}", f"model_profile = {row.model_profile}"
|
||||
for k in ("resolution", "duration_s", "tokens", "price_cny_per_mtoken"):
|
||||
assert k in row.units, f"units 缺 {k} snapshot"
|
||||
print(f"[OK] usage_events: kind={row.kind} model={row.model_profile} cost_cny={row.cost_cny}")
|
||||
print(f" units snapshot: {row.units}")
|
||||
|
||||
print("\n[PASS] smoke_seedance 全部通过")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
|
|
@ -0,0 +1,243 @@
|
|||
---
|
||||
name: videogen
|
||||
description: 用豆包 Seedance 2.0 Fast 生视频(`seedance` tool)。**任何生视频任务调 tool 前必须 load 本 skill**。触发词:视频 / 动画 / 动起来 / 做个 video / 做段视频 / 出段视频 / 生成视频 / mov / mp4 / 短片 / 镜头 / 运动镜头 / 演示视频 / 动效。核心:视频比图贵 10 倍(¥1.86-¥4+ / 段),且要等 30-90s,问清楚再画 + 强制确认。
|
||||
---
|
||||
|
||||
# Videogen
|
||||
|
||||
把"我想要个视频"变成一段能用的视频。流程:
|
||||
|
||||
**诊断模糊度(6 维) → 一次性给推断 + 待确认项 → 用户拍板维度 → 装配最终 prompt + 参数 → ⛔ 把 prompt+参数完整贴给用户看 + 问改不改 → 用户明确确认后 → 调 `seedance`**
|
||||
|
||||
每次 `seedance` 调用 **¥1.86 起**(480p 4s)→ **¥4.00 起**(720p 5s)→ **¥12+**(720p 15s),且要等 **30-90 秒**才出片。**贵 + 慢 = 试错代价高,必须问清楚再调**。
|
||||
|
||||
## ⛔ 调 tool 前的强制门(铁律,比 imagegen 更严格)
|
||||
|
||||
**任何情况下,在 `seedance` tool call 发出去之前,都必须先把最终装配好的 prompt(包括 resolution / ratio / duration / watermark 参数)用对话消息明文展示给用户,然后问"这样画?要改什么?"并 ⛔ BLOCKING 等用户明确回复。**
|
||||
|
||||
- 用户回复 "可以" / "OK" / "就这样" / "对" / "嗯" / "画吧" / "出片吧" / 简单确认 → 可以调 tool
|
||||
- 用户回复 "把 X 改成 Y" / "时长改 8s" / "镜头换成 Y" → 改完后**再次贴最终 prompt + 再次等确认**
|
||||
- 用户**沉默 / 长时间不回 / 追问别的事** → 不算确认,**继续等**,不要自作主张
|
||||
- 用户回 "看起来不错" / "差不多" / 模棱两可 → **主动追问一句"这就开烧 ¥X?"**,拿到明确"是"再调
|
||||
|
||||
**为什么比 imagegen 还严**:视频单价 ¥4 起,比图贵 10 倍以上;一次失败相当于 18 张错图;且要等 30-90s,用户改方向的等待代价也很高。**装配 prompt 不等于授权调用** —— 装配是模型脑内运算,授权要落到用户的"嗯,画吧"上。
|
||||
|
||||
## 何时用本 skill
|
||||
|
||||
- 用户**明确说**"做个视频" / "出段视频" / "动画" / "动起来" / "镜头扫过" / "演示视频"
|
||||
- 用户原本要 ppt / 海报 / 申报书,**主动**问能不能配段视频(此时介绍价格 + 时长引导决策)
|
||||
- 用户拿到 seedream 生的静态图后说"想动起来" / "加点运动" — 注意 phase 1 只支持 t2v 文生视频,**不支持**从已有图生视频(i2v),要告诉用户这点
|
||||
|
||||
## 何时不走本 skill
|
||||
|
||||
- 用户**没主动要视频**(别为"丰富回复"装饰性生视频 —— 比图更严重的浪费红线)
|
||||
- 用户要的是**流程/结构动效**(节点-箭头-步骤逐次出现)→ 这是 ppt 动画 / mermaid + ppt 转场的事,不是视频
|
||||
- 用户要的是**实拍素材** / **已有视频剪辑** → seedance 是 AI 生成,不是素材库;告诉用户走 unsplash / pexels 等
|
||||
- 用户**有具体参考视频说"按这个改"** → phase 1 不支持 i2v / v2v,告诉用户先用文字描述
|
||||
|
||||
## 关键岔路:seedream vs seedance vs mermaid
|
||||
|
||||
**默认倾向 seedream / mermaid**(静态够用就别上动态):
|
||||
- 用户要的是**单张图**(封面 / 配图 / 示意图)→ seedream
|
||||
- 用户要的是**结构/流程**(节点关系 / 时序 / 架构)→ mermaid
|
||||
|
||||
**反向选 seedance**(满足任一):
|
||||
- 用户**明确说**视频/动画/动起来/做个 video(而非"画一张" / "做个图")
|
||||
- 内容本身**离不开运动**:水流 / 火焰 / 旋转 / 镜头扫过场景 / 角色动作
|
||||
- ppt 引子页要**视觉冲击 + 时间维度**(2-3s 短动画开场)
|
||||
- 用户先做了静态图,**明确**说"想动起来" / "加运动感"
|
||||
|
||||
**模糊时主动问一句**:
|
||||
|
||||
> 你这是想要 **一张静态图**(seedream,¥0.22,3-5 秒出图),还是 **一段短视频**(seedance,¥4 起,等 30-90s)?静态图够的话省钱省时间。
|
||||
|
||||
## 诊断模糊度 — 六维清单(运动是新增的必填维)
|
||||
|
||||
| 维度 | 缺失信号 |
|
||||
|---|---|
|
||||
| **主体** What | "做个混凝土视频" → 视频里有什么?试块 / 楼板 / 工程现场 / 微观裂纹? |
|
||||
| **运动** Motion ⭐ | **视频特有,必问** — 主体在做什么?(浇筑 / 凝固 / 裂开 / 旋转 / 镜头扫过)。**没运动 = 应该走 seedream**。 |
|
||||
| **场景** Where | 工地 / 实验室 / 抽象空间 / 极简白底? |
|
||||
| **镜头** Camera | 镜头怎么动?(固定 / 跟随 / 推近 / 拉远 / 环绕 / 俯视下降)。AI 视频对镜头描述很敏感,不写默认随机晃动 |
|
||||
| **风格** Style | 写实摄影 / 工业 CG / 扁平 2D 动画 / 水墨 / 赛博朋克 / 学术示意? |
|
||||
| **时长 + 分辨率 + 比例** | **时长**:4-15s,默认 5s(短=便宜)。**分辨率**:fast 仅 480p/720p,默认 720p。**比例**:ppt 16:9 / 短视频 9:16 / 头像 1:1。 |
|
||||
|
||||
**评估规则**:
|
||||
- 6 维填齐 / 缺 ≤1 → 可以直接装配 prompt 调用
|
||||
- **运动维不能省** —— 没运动就劝退到 seedream(¥0.22 vs ¥4,差 18 倍)
|
||||
- 缺 2 维及以上 → **先问再画**
|
||||
- 时长、分辨率、比例三选一缺即追问;**没说时长就默认 5s 是错的**,要问"4s 短演示 / 5s 默认 / 8s 中等 / 12s+ 长镜头?",时长直接决定钱:`cost ≈ ¥4 × duration/5 × resolution_factor`
|
||||
|
||||
## 一次性给推断 + 待确认项(不要一个个问)
|
||||
|
||||
模糊时一次摆出推断,让用户改或确认:
|
||||
|
||||
> 你说"做个混凝土浇筑的视频",我打算这样画:
|
||||
> - **主体**:工地上正在浇筑混凝土的楼板
|
||||
> - **运动**:混凝土从泵车软管流出注入模板,工人手持振动棒来回插入
|
||||
> - **场景**:工地中景,远处有塔吊
|
||||
> - **镜头**:固定俯视 + 缓慢推近模板中心
|
||||
> - **风格**:写实工程纪录片
|
||||
> - **时长 / 分辨率 / 比例**:5s / 720p / 16:9(ppt 用) — ¥4.00,等 30-90s
|
||||
>
|
||||
> 这样画可以吗?或者告诉我:想突出什么?(浇筑工艺细节 / 振动棒动作 / 大场面气势 / 慢动作)
|
||||
|
||||
## 装配 prompt — 用户拍板维度后
|
||||
|
||||
把前 5 维拼成自然中文 prompt 文本(运动 / 镜头是新增重点),**时长 / 分辨率 / 比例走参数**:
|
||||
|
||||
```
|
||||
工地上正在浇筑混凝土的楼板,混凝土从泵车软管流出注入模板,
|
||||
工人手持振动棒来回插入,固定俯视镜头缓慢推近模板中心,
|
||||
写实工程纪录片风格,正午阳光
|
||||
```
|
||||
|
||||
要点:
|
||||
- **运动描述具体** —— "流动 / 旋转 / 推近 / 倾倒",不要"动起来"这种无信息词
|
||||
- **镜头单独成句** —— "固定俯视 / 缓慢推近 / 环绕一周 / 跟随主体平移";不写默认镜头随机晃动
|
||||
- **不要堆形容词** —— 模型理解动作 > 形容词
|
||||
- **不要写否定** —— "no shaking, not blurry" 反向起效;Seedance 不支持 negative prompt
|
||||
- 主体放最前,镜头 / 风格放后
|
||||
|
||||
## ⛔ 调 tool 前再过一道:把最终 prompt + 参数贴给用户
|
||||
|
||||
装配完后**立即**用对话消息把最终结果贴出来 + 等用户明确确认。**装配 ≠ 授权调用**。
|
||||
|
||||
格式建议(代码块包起来,清晰可读):
|
||||
|
||||
> 我准备调 seedance,参数如下:
|
||||
>
|
||||
> ````
|
||||
> prompt: 工地上正在浇筑混凝土的楼板,混凝土从泵车软管流出注入模板,
|
||||
> 工人手持振动棒来回插入,固定俯视镜头缓慢推近模板中心,
|
||||
> 写实工程纪录片风格,正午阳光
|
||||
> resolution: 720p
|
||||
> ratio: 16:9(ppt 横版)
|
||||
> duration: 5 秒
|
||||
> watermark: false
|
||||
> 预计花费: ¥4.00
|
||||
> 预计等待: 30-90 秒
|
||||
> ````
|
||||
>
|
||||
> 这样开烧?要改什么?(改 prompt 文字 / 改时长 / 换比例 / 降到 480p 省钱)
|
||||
|
||||
然后 ⛔ **BLOCKING:等用户明确回复**。
|
||||
|
||||
**为什么强调** —— ¥4 一段视频,用户没机会在调用前看清楚 prompt 的话,生成后只能事后看视频反推哪句话错了,改一次又是 ¥4 + 1.5 分钟。一次对话往返(免费)避免一次错片(¥4 + 90s 等待),是最划算的强制门。
|
||||
|
||||
## 参数取舍
|
||||
|
||||
### `resolution`(分辨率)
|
||||
|
||||
| 选项 | 像素 (16:9) | 适用场景 | 与 720p 价格比 |
|
||||
|---|---|---|---|
|
||||
| `480p` | 854×480 | 短演示 / 邮件附件 / 低带宽预览 / **省钱档** | ~0.47x(便宜一半) |
|
||||
| `720p`(默认) | 1280×720 | ppt 嵌入 / 公众号 / 一般演示 | 1.0x(¥4 / 5s) |
|
||||
| `1080p` | 1920×1080 | **fast 不支持**,会报 [Error];仅 pro 可用 | — |
|
||||
|
||||
**默认 720p 是合理的"够用挡"**;**预算紧 / 试效果**用 480p;**fast 上限就是 720p**,用户要 1080p+ 要切到 pro variant(yaml 里没配 pro 时直接告诉用户)。
|
||||
|
||||
### `duration`(时长)
|
||||
|
||||
| 时长 | 适用 | 720p 16:9 估价 |
|
||||
|---|---|---|
|
||||
| 4s | 最短,纯展示 / 转场 / loop | ~¥3.20 |
|
||||
| 5s(默认) | 标准短视频 / ppt 引子 | ~¥4.00 |
|
||||
| 8s | 中等叙事 / 完整动作 | ~¥6.40 |
|
||||
| 12s | 长镜头 / 复杂场景 | ~¥9.60 |
|
||||
| 15s | 上限 / 完整 plot | ~¥12.00 |
|
||||
|
||||
**线性扩展**(每秒 ~¥0.8 @ 720p);**短=便宜=快**,无明确叙事需求默认 5s。**4s 是最低,< 4 报错**。
|
||||
|
||||
### `ratio`(比例)
|
||||
|
||||
| 比例 | 适用 |
|
||||
|---|---|
|
||||
| `16:9`(默认) | ppt / 横版演示 / 一般用途 |
|
||||
| `9:16` | 短视频 / 抖音 / 小红书 / 手机海报 |
|
||||
| `1:1` | 社交头像 / Instagram |
|
||||
| `4:3` | 老电视 / 复古 |
|
||||
| `3:4` | 杂志竖版 |
|
||||
| `21:9` | 电影超宽 / Banner |
|
||||
| `adaptive` | 模型自己选 |
|
||||
|
||||
**默认 16:9** 适合 90% ppt / 申报 / 公众号场景。**比例错了后期裁切很难还原构图** —— 不问用途默认 16:9 也比默认 1:1 强(seedream 那边有 1:1 误区,这里别犯)。
|
||||
|
||||
### `watermark`
|
||||
|
||||
| 默认 | 何时改 |
|
||||
|---|---|
|
||||
| `false` | 默认无水印(申报 / ppt / 客户交付都不该带);仅当用户明确说"加水印"才传 `true` |
|
||||
|
||||
## 调用范式
|
||||
|
||||
**前置条件**:用户已经看过最终 prompt + 所有参数,明确回复"可以" / "OK" / "出片吧" 之类。**没看到这个确认就不要调**。
|
||||
|
||||
`seedance` 是 tool 不是 skill 函数,**直接调,不要 `run_python` 包一层**:
|
||||
|
||||
```
|
||||
seedance(
|
||||
prompt="工地上正在浇筑混凝土的楼板,混凝土从泵车软管流出注入模板,工人手持振动棒来回插入,固定俯视镜头缓慢推近模板中心,写实工程纪录片风格,正午阳光",
|
||||
resolution="720p", # 可省,走默认
|
||||
ratio="16:9", # 可省
|
||||
duration=5, # 可省
|
||||
watermark=false, # 可省
|
||||
)
|
||||
```
|
||||
|
||||
**调用是同步阻塞 30-90s** —— tool 内部 submit 后轮询直到 succeeded,期间 LLM 卡住。这不是 bug,告诉用户"提交了,等 30-90 秒"再耐心等返回。
|
||||
|
||||
返回串首行是 `[seedance] model=... · resolution=... · ratio=... · duration=Xs · cost=¥... · elapsed=...s` —— 原样保留给用户(SPA 会 parse 挂徽章)。第二行 `saved: <相对路径>` 是产物路径,告诉用户。
|
||||
|
||||
产物自动落 `<task_dir>/videos/<时间戳>-<rand>.mp4` + 同名 `.meta.json`(prompt / 参数 / cost / tokens / cgt_id 全 snapshot)。
|
||||
|
||||
## 失败 / 不满意后怎么办
|
||||
|
||||
**不要原 prompt 重发**!那是 ¥4 一发的浪费。
|
||||
|
||||
| 现象 | 原因 | 解药 |
|
||||
|---|---|---|
|
||||
| 主体动作不对 | 运动描述太抽象("动起来"→ 无信息) | 用具体动词:"从软管流出 / 来回插入 / 缓慢上升" |
|
||||
| 镜头不稳 / 随机晃动 | 没指定镜头 | 加"固定镜头" / "缓慢推近" / "环绕一周";尤其要明确镜头是否固定 |
|
||||
| 主体被切掉 | 比例与构图不匹配 | 改 ratio 或在 prompt 里写"全景 / 中景 / 主体居中" |
|
||||
| 风格不对 | 风格词位置太靠后 / 缺少 | 风格词放 prompt 前 1/3:"写实工程纪录片风格,xxx" |
|
||||
| 时长不够讲完动作 | duration 太短 | 4s 适合单一动作,多步骤动作至少 8s |
|
||||
| `[Error] seedance API` 报错(传 1080p 时) | fast 不支持 1080p+ | 降到 720p 重试;或换 pro variant(yaml 加配置) |
|
||||
| `[Error] seedance 轮询超时` | 队列拥堵 / 服务异常 | 等几分钟再试;cgt_id 在 24h 内仍可手工 GET 拉结果 |
|
||||
| `[Cancelled]` | 用户点了停止 | **告诉用户:远端任务可能仍在跑,Volcengine 失败/成功才计费,若仍出片可能产生费用** |
|
||||
| 出现奇怪文字 | Seedance 文字渲染不稳 | prompt 里不要要求画文字 / 字幕;字幕后期 ppt 加 |
|
||||
|
||||
**先口头跟用户对齐改哪一维**,再发新调用 —— 不要"再画一段试试"连续烧钱。
|
||||
|
||||
## 产物处理
|
||||
|
||||
生视频完成后:
|
||||
- 把 `saved: videos/xxx.mp4` 路径告诉用户(SPA 会自动 inline 播放器,点击可全屏)
|
||||
- 如果是做 ppt,提醒用户 `python-pptx` 可以 `add_movie` 嵌入 mp4(但导出 pdf 时视频会丢,改截关键帧 + 链接)
|
||||
- 用户说"换一段" → 走上节的"对齐改哪一维"流程,**不要默认重发**
|
||||
- 用户说"再来几段备选" → **先确认**:"备选 N 段会花 ¥{4 × N} + 等 {N × 60}s,确认?"
|
||||
|
||||
## 反模式
|
||||
|
||||
- ❌ **没把最终 prompt+参数贴给用户看就直接 tool call** —— 比 imagegen 更严格的铁律(¥4 vs ¥0.22)
|
||||
- ❌ **用户回"看起来不错"就当确认调 tool** —— 模棱两可必须追问到明确"是 / OK / 画吧"
|
||||
- ❌ **运动维度跳过**(=应该走 seedream 却走 seedance) —— 浪费 18 倍价钱
|
||||
- ❌ 用户没主动要视频就装饰性生成
|
||||
- ❌ 用户说"做个动画" 直接拿这 4 个字当 prompt 调用 —— 六维清单至少先问 2 轮
|
||||
- ❌ 一个个问"主体?""镜头?" —— 一次性摆推断 + 让用户改
|
||||
- ❌ **不问时长直接默认 5s** —— 5s 是合理默认但要让用户知道有 4-15s 区间;长视频(>8s)开调用前必须强调成本
|
||||
- ❌ **不问分辨率上 720p** —— 试效果阶段先 480p(便宜一半)
|
||||
- ❌ 同一目的不满意连续重发 —— 1 次错 = ¥4 + 60s,2 次连发 = ¥8 + 2min,先校准 prompt 再调
|
||||
- ❌ prompt 写"动起来 / 有动感" —— Seedance 不靠形容词,要具体动词
|
||||
- ❌ prompt 里写否定 "no shaking, not blurry" —— Seedance 不支持 negative,反向起效
|
||||
- ❌ 让用户在 seedream / seedance 之间默默替他决定 —— 模糊就一句话问明白
|
||||
- ❌ phase 1 拿用户已生成的图试图 i2v —— **当前不支持**,明确告诉用户
|
||||
- ❌ 用 `run_python` 调 `requests` 裸打豆包 API —— 走 `seedance` tool(已封装异步轮询 + 计费 + 落盘 + meta + cancel)
|
||||
|
||||
## 输出
|
||||
|
||||
调完告诉用户:
|
||||
- 文件相对路径(`videos/xxx.mp4`)
|
||||
- 本次成本(¥X.XX,从 banner 抽)
|
||||
- 用的 prompt 摘要(主体 / 运动 / 镜头 3 件套)
|
||||
- 一句"要换方向 / 调镜头 / 加长时长再来一段吗?"
|
||||
|
|
@ -0,0 +1,342 @@
|
|||
"""seedance: 调豆包 Seedance 2.0 Fast 视频生成 API,产物落 working_dir/videos/。
|
||||
|
||||
异步任务:
|
||||
1. POST /contents/generations/tasks → 返 `{"id": "cgt-..."}`
|
||||
2. 轮询 GET /contents/generations/tasks/<cgt_id>(默认 5s 间隔,最长 10min)
|
||||
直到 status ∈ {succeeded, failed, expired, cancelled}
|
||||
3. succeeded → 取 content.video_url → download 到本地 + 写 meta + 计 usage_events
|
||||
|
||||
模型 ID + 单价 + 默认参数全在 `config/media/doubao.yaml`,本 tool 只装配。
|
||||
计费按 token 公式 `(in_dur+out_dur) × W × H × fps / 1024`,文生视频 in_dur=0;
|
||||
W×H 由 resolution + ratio 推算(横版 height=resolution_num,竖版 width=resolution_num)。
|
||||
|
||||
完成后:
|
||||
- 视频落到 `<wd>/videos/<YYYYMMDD-HHMMSS>-<rand6>.mp4`
|
||||
- 同名 `.meta.json` 写 prompt / model / 参数 / cost_cny / tokens / cgt_id / ts
|
||||
- usage_events 写 kind="video" 一行(单价 + 分辨率 + 时长 snapshot 进 units)
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import secrets
|
||||
import time
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Callable, Optional
|
||||
from uuid import UUID
|
||||
|
||||
from core.ark_client import ArkClient, ArkConfig, ArkError
|
||||
from core.storage.usage import record_video_usage
|
||||
|
||||
from .base import Tool
|
||||
|
||||
|
||||
# resolution → 短边像素;W/H 实际由 ratio 决定(横版短边=H,竖版短边=W)
|
||||
_RESOLUTION_TO_SHORT_EDGE: dict[str, int] = {
|
||||
"480p": 480,
|
||||
"720p": 720,
|
||||
"1080p": 1080, # fast 不支持,留给 pro;tool 不在这层挡,让豆包侧拒绝并返 [Error]
|
||||
}
|
||||
|
||||
_VALID_RATIOS: set[str] = {"16:9", "9:16", "1:1", "4:3", "3:4", "21:9", "adaptive"}
|
||||
|
||||
|
||||
def _resolve_dimensions(resolution: str, ratio: str) -> tuple[int, int]:
|
||||
"""resolution(短边)+ ratio → (width, height) 像素值。
|
||||
|
||||
adaptive 比例下无法预知 W/H,退回正方形按短边算(仅供 token 估算,实际 W/H 由豆包决定)。
|
||||
未知 resolution → 默认 720p。
|
||||
"""
|
||||
short = _RESOLUTION_TO_SHORT_EDGE.get(resolution, 720)
|
||||
if ratio == "adaptive" or ratio not in _VALID_RATIOS:
|
||||
return short, short
|
||||
num_str, den_str = ratio.split(":", 1)
|
||||
num, den = int(num_str), int(den_str)
|
||||
if num >= den: # 横版或方形:height 是短边
|
||||
return round(short * num / den), short
|
||||
# 竖版:width 是短边
|
||||
return short, round(short * den / num)
|
||||
|
||||
|
||||
def _estimate_tokens(width: int, height: int, duration_s: int, fps: int, in_dur_s: int = 0) -> int:
|
||||
"""火山方舟 token 估算公式:`(in_dur + out_dur) × W × H × fps / 1024`。
|
||||
|
||||
实测校验(fast, 720p 16:9, 5s, 24fps, 文生视频):
|
||||
(0+5) × 1280 × 720 × 24 / 1024 = 108,000 tokens × ¥37/Mtok = ¥3.996 ≈ ¥4.00 ✓
|
||||
"""
|
||||
return int((in_dur_s + duration_s) * width * height * fps / 1024)
|
||||
|
||||
|
||||
class SeedanceTool(Tool):
|
||||
name = "seedance"
|
||||
description = (
|
||||
"Generate a short video with Doubao Seedance 2.0 Fast and save to working_dir/videos/. "
|
||||
"Use only when the user explicitly asks for a video / 视频 / 动画 / 动起来. "
|
||||
"Async: takes 30-90s to render. Costs ~¥1.86 (480p, 5s) ~ ¥4.00 (720p, 5s); "
|
||||
"longer / higher-resolution scales up. Returns the saved relative path."
|
||||
)
|
||||
parameters = {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"prompt": {
|
||||
"type": "string",
|
||||
"description": "中文或英文都行,详尽描述画面 + 运动 + 镜头(主体在做什么 / 镜头怎么动 / 场景 / 风格)。",
|
||||
},
|
||||
"resolution": {
|
||||
"type": "string",
|
||||
"description": "Video resolution. fast 版仅支持 '480p' / '720p'(默认 720p);1080p+ 仅 pro 可用。",
|
||||
"enum": ["480p", "720p", "1080p"],
|
||||
},
|
||||
"ratio": {
|
||||
"type": "string",
|
||||
"description": "Aspect ratio. 默认 '16:9'(ppt/横屏配视频)。竖版海报/短视频用 '9:16'。",
|
||||
"enum": ["16:9", "9:16", "1:1", "4:3", "3:4", "21:9", "adaptive"],
|
||||
},
|
||||
"duration": {
|
||||
"type": "integer",
|
||||
"description": "视频时长(秒),范围 4-15。短=便宜,默认 5。每加 1s 成本约线性上升。",
|
||||
"minimum": 4,
|
||||
"maximum": 15,
|
||||
},
|
||||
"watermark": {
|
||||
"type": "boolean",
|
||||
"description": "是否打豆包水印。默认 false(申报/PPT 场景反需求)。",
|
||||
},
|
||||
},
|
||||
"required": ["prompt"],
|
||||
}
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
*,
|
||||
ark_cfg: ArkConfig,
|
||||
video_variant_cfg: dict,
|
||||
variant_key: str,
|
||||
working_dir: Path,
|
||||
task_id: UUID,
|
||||
user_id: UUID,
|
||||
base_dir: Optional[Path] = None,
|
||||
user_root: Optional[Path] = None,
|
||||
cancel_check: Optional[Callable[[], bool]] = None,
|
||||
) -> None:
|
||||
super().__init__(base_dir, user_root=user_root)
|
||||
self.ark_cfg = ark_cfg
|
||||
self.cfg = video_variant_cfg
|
||||
self.variant_key = variant_key # 'seedance_2_fast' → usage_events.model_profile = "doubao.seedance_2_fast"
|
||||
self.working_dir = Path(working_dir)
|
||||
self.task_id = task_id
|
||||
self.user_id = user_id
|
||||
self.cancel_check = cancel_check # 轮询期间检查是否被 cancel
|
||||
|
||||
def execute(
|
||||
self,
|
||||
prompt: str,
|
||||
resolution: Optional[str] = None,
|
||||
ratio: Optional[str] = None,
|
||||
duration: Optional[int] = None,
|
||||
watermark: Optional[bool] = None,
|
||||
) -> str:
|
||||
if not (prompt or "").strip():
|
||||
return "[Error] prompt 不能为空"
|
||||
|
||||
cfg = self.cfg
|
||||
model_id = cfg["model_id"]
|
||||
chosen_resolution = resolution or cfg.get("default_resolution", "720p")
|
||||
chosen_ratio = ratio or cfg.get("default_ratio", "16:9")
|
||||
chosen_duration = int(duration) if duration is not None else int(cfg.get("default_duration", 5))
|
||||
chosen_watermark = bool(cfg.get("default_watermark", False)) if watermark is None else bool(watermark)
|
||||
fps = int(cfg.get("fps", 24))
|
||||
|
||||
submit_timeout = float(cfg.get("request_timeout_s", 60))
|
||||
poll_interval = float(cfg.get("poll_interval_s", 5))
|
||||
poll_timeout = float(cfg.get("poll_timeout_s", 600))
|
||||
price_t2v = float(cfg.get("price_cny_per_mtoken_text2video", 37.0))
|
||||
|
||||
submit_endpoint = cfg.get("endpoint_submit", "/contents/generations/tasks")
|
||||
poll_endpoint_base = cfg.get("endpoint_poll", "/contents/generations/tasks")
|
||||
|
||||
body: dict[str, Any] = {
|
||||
"model": model_id,
|
||||
"content": [{"type": "text", "text": prompt}],
|
||||
"ratio": chosen_ratio,
|
||||
"resolution": chosen_resolution,
|
||||
"duration": chosen_duration,
|
||||
"watermark": chosen_watermark,
|
||||
}
|
||||
|
||||
t0 = time.monotonic()
|
||||
try:
|
||||
with ArkClient(self.ark_cfg, timeout_s=submit_timeout) as client:
|
||||
# 1. submit
|
||||
submit_resp = client.post_json(submit_endpoint, body, timeout_s=submit_timeout)
|
||||
cgt_id = self._extract_task_id(submit_resp)
|
||||
if not cgt_id:
|
||||
return f"[Error] seedance submit 响应缺 task id: {json.dumps(submit_resp, ensure_ascii=False)[:300]}"
|
||||
|
||||
# 2. poll
|
||||
poll_url = f"{poll_endpoint_base}/{cgt_id}"
|
||||
deadline = time.monotonic() + poll_timeout
|
||||
last_status = ""
|
||||
final_resp: dict = {}
|
||||
while True:
|
||||
if self.cancel_check is not None and self.cancel_check():
|
||||
return (
|
||||
f"[Cancelled] seedance task {cgt_id} 用户取消(远端任务可能仍在跑;"
|
||||
f"Volcengine 失败/成功才计费,若仍出片可能产生 ~¥{self._rough_cost(chosen_resolution, chosen_ratio, chosen_duration, fps, price_t2v):.2f})"
|
||||
)
|
||||
if time.monotonic() > deadline:
|
||||
return (
|
||||
f"[Error] seedance 轮询超时(>{poll_timeout:.0f}s),最后 status={last_status!r},"
|
||||
f"cgt_id={cgt_id}(24h 内可手工 GET {poll_url} 拉结果)"
|
||||
)
|
||||
time.sleep(poll_interval)
|
||||
poll_resp = client.get_json(poll_url, timeout_s=submit_timeout)
|
||||
last_status = str(poll_resp.get("status") or "").lower()
|
||||
final_resp = poll_resp
|
||||
if last_status in ("succeeded", "failed", "expired", "cancelled"):
|
||||
break
|
||||
|
||||
if last_status != "succeeded":
|
||||
err = (final_resp.get("error") or {}) if isinstance(final_resp.get("error"), dict) else {}
|
||||
msg = err.get("message") or final_resp.get("message") or "(无错误描述)"
|
||||
return f"[Error] seedance task {cgt_id} 终态 status={last_status},msg={msg}"
|
||||
|
||||
video_url = self._extract_video_url(final_resp)
|
||||
if not video_url:
|
||||
return f"[Error] seedance succeeded 但响应缺 video url: {json.dumps(final_resp, ensure_ascii=False)[:400]}"
|
||||
|
||||
# 3. download
|
||||
ts = datetime.now().strftime("%Y%m%d-%H%M%S")
|
||||
short = secrets.token_hex(3)
|
||||
videos_dir = self.working_dir / "videos"
|
||||
dest_mp4 = videos_dir / f"{ts}-{short}.mp4"
|
||||
client.download(video_url, dest_mp4, timeout_s=300.0)
|
||||
except ArkError as e:
|
||||
return f"[Error] seedance API: {e}"
|
||||
|
||||
elapsed = time.monotonic() - t0
|
||||
|
||||
# tokens / cost:优先用响应里 usage 字段(若豆包返了),否则按公式估算
|
||||
width, height = _resolve_dimensions(chosen_resolution, chosen_ratio)
|
||||
tokens_estimated = _estimate_tokens(width, height, chosen_duration, fps, in_dur_s=0)
|
||||
tokens_actual = self._extract_tokens(final_resp) or tokens_estimated
|
||||
cost_cny = tokens_actual * price_t2v / 1_000_000.0
|
||||
|
||||
meta = {
|
||||
"prompt": prompt,
|
||||
"model_id": model_id,
|
||||
"resolution": chosen_resolution,
|
||||
"ratio": chosen_ratio,
|
||||
"width": width,
|
||||
"height": height,
|
||||
"duration_s": chosen_duration,
|
||||
"fps": fps,
|
||||
"watermark": chosen_watermark,
|
||||
"tokens": tokens_actual,
|
||||
"tokens_estimated": tokens_estimated,
|
||||
"price_cny_per_mtoken": price_t2v,
|
||||
"cost_cny": round(cost_cny, 4),
|
||||
"elapsed_s": round(elapsed, 1),
|
||||
"cgt_id": cgt_id,
|
||||
"ts": datetime.now().isoformat(timespec="seconds"),
|
||||
}
|
||||
meta_path = dest_mp4.with_suffix(".meta.json")
|
||||
meta_path.write_text(json.dumps(meta, ensure_ascii=False, indent=2), encoding="utf-8")
|
||||
|
||||
try:
|
||||
record_video_usage(
|
||||
task_id=self.task_id,
|
||||
user_id=self.user_id,
|
||||
model_profile=f"doubao.{self.variant_key}",
|
||||
resolution=chosen_resolution,
|
||||
ratio=chosen_ratio,
|
||||
duration_s=chosen_duration,
|
||||
fps=fps,
|
||||
width=width,
|
||||
height=height,
|
||||
tokens=tokens_actual,
|
||||
price_cny_per_mtoken=price_t2v,
|
||||
has_video_input=False, # phase 1 仅 t2v;i2v 接入后这里读 body 判断
|
||||
watermark=chosen_watermark,
|
||||
extra_units={"cgt_id": cgt_id, "elapsed_s": round(elapsed, 1)},
|
||||
)
|
||||
except Exception as e:
|
||||
print(f"[seedance] record_video_usage failed: {type(e).__name__}: {e}", flush=True)
|
||||
|
||||
disp = self._display(dest_mp4)
|
||||
# banner 协议与 seedream 一致:首行 `[tool] key=value · key=value ...`
|
||||
# 前端 extractMediaBanner 已 whitelist seedance,正则抓 key=value 挂徽章
|
||||
return (
|
||||
f"[seedance] model={model_id} · resolution={chosen_resolution} · ratio={chosen_ratio} · "
|
||||
f"duration={chosen_duration}s · cost=¥{cost_cny:.2f} · elapsed={elapsed:.1f}s\n"
|
||||
f"saved: {disp}\n"
|
||||
f"prompt={prompt!r}\n"
|
||||
f"watermark={chosen_watermark} cgt_id={cgt_id}"
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _rough_cost(resolution: str, ratio: str, duration_s: int, fps: int, price_per_mtoken: float) -> float:
|
||||
w, h = _resolve_dimensions(resolution, ratio)
|
||||
tokens = _estimate_tokens(w, h, duration_s, fps)
|
||||
return tokens * price_per_mtoken / 1_000_000.0
|
||||
|
||||
@staticmethod
|
||||
def _extract_task_id(resp: dict) -> str:
|
||||
"""submit 响应抽 cgt-xxx task id。容忍几种已知 shape。"""
|
||||
for k in ("id", "task_id", "request_id"):
|
||||
v = resp.get(k)
|
||||
if isinstance(v, str) and v:
|
||||
return v
|
||||
data = resp.get("data")
|
||||
if isinstance(data, dict):
|
||||
for k in ("id", "task_id"):
|
||||
v = data.get(k)
|
||||
if isinstance(v, str) and v:
|
||||
return v
|
||||
return ""
|
||||
|
||||
@staticmethod
|
||||
def _extract_video_url(resp: dict) -> str:
|
||||
"""succeeded 响应抽 video url。已知路径:
|
||||
- {"content": {"video_url": "..."}}(火山方舟标准)
|
||||
- {"data": {"video_url": "..."}}(部分代理)
|
||||
- 兜底:任意位置首个 .url 字符串
|
||||
"""
|
||||
for key in ("content", "data", "output"):
|
||||
sub = resp.get(key)
|
||||
if isinstance(sub, dict):
|
||||
v = sub.get("video_url")
|
||||
if isinstance(v, str) and v.startswith("http"):
|
||||
return v
|
||||
# output[0].content[0].text → json parse(代理路径,极少用到)
|
||||
# 递归兜底
|
||||
def _find_url(o: Any) -> Optional[str]:
|
||||
if isinstance(o, dict):
|
||||
for k, v in o.items():
|
||||
if k in ("video_url", "url") and isinstance(v, str) and v.startswith("http"):
|
||||
# 过滤明显非视频的(image_url 等),video_url 优先级最高
|
||||
if k == "video_url":
|
||||
return v
|
||||
if v.lower().endswith((".mp4", ".webm", ".mov")):
|
||||
return v
|
||||
r = _find_url(v)
|
||||
if r:
|
||||
return r
|
||||
elif isinstance(o, list):
|
||||
for x in o:
|
||||
r = _find_url(x)
|
||||
if r:
|
||||
return r
|
||||
return None
|
||||
return _find_url(resp) or ""
|
||||
|
||||
@staticmethod
|
||||
def _extract_tokens(resp: dict) -> Optional[int]:
|
||||
"""响应里若带官方 usage 字段就取,优先 total_tokens > completion_tokens。"""
|
||||
usage = resp.get("usage")
|
||||
if isinstance(usage, dict):
|
||||
for k in ("total_tokens", "completion_tokens", "tokens"):
|
||||
v = usage.get(k)
|
||||
if isinstance(v, (int, float)) and v > 0:
|
||||
return int(v)
|
||||
return None
|
||||
104
web/app.py
104
web/app.py
|
|
@ -255,28 +255,33 @@ def _validate_transfer(
|
|||
# ─────────────────── BG run + SSE 帧格式 ───────────────────
|
||||
|
||||
def _run_agent_bg(
|
||||
task_id: UUID, user_id: UUID, user_message: str, image_variant: str = "",
|
||||
task_id: UUID, user_id: UUID, user_message: str,
|
||||
image_variant: str = "", video_variant: str = "",
|
||||
) -> None:
|
||||
"""工作线程:`build_agent(resume=True)` → 装 WebEventSink + cancel_check → `agent.run` → 写 tasks.run_status。
|
||||
|
||||
sink 通过 broker.emit 桥事件回 asyncio loop;agent.run 是 sync,所以在 to_thread 跑。
|
||||
user_id 必须从 JWT 那侧透传过来 —— 决定 memory_block 读哪个 per-user 子树。
|
||||
cancel_check 桥 broker.is_cancelled,loop 在 stream chunk 间 + 工具调用之间 poll;
|
||||
cancel 延迟 ~ 单 chunk 间隔(100ms 级)。
|
||||
cancel 延迟 ~ 单 chunk 间隔(100ms 级);seedance 轮询间也读这个 cancel_check 用于
|
||||
用户停止按钮(必须在 build_agent 阶段就传进去,因为 SeedanceTool ctor 持有它,
|
||||
不能像以前那样 build_agent 返回后再赋 agent.cancel_check)。
|
||||
`ok / cancelled` 收尾直接回 `idle`(不留持久标记);只有 error 是持久终态。
|
||||
|
||||
image_variant:本 run 用哪个图像 variant 装 seedream(空 → yaml 第一个)。
|
||||
image_variant / video_variant:本 run 用哪个 image/video variant 装 tool(空 → yaml 第一个)。
|
||||
随消息 POST 传进来,不入 DB —— UI 下拉的选择就跟在这一条消息上生效。
|
||||
"""
|
||||
from core.agent_builder import build_agent, sync_task_tokens
|
||||
cancel_check = lambda tid=task_id: broker.is_cancelled(tid)
|
||||
try:
|
||||
broker.emit(task_id, {"type": "run_start"})
|
||||
agent, session, sid, task_state, task_dir = build_agent(
|
||||
session_id=str(task_id), resume=True, user_id=user_id,
|
||||
image_variant=image_variant,
|
||||
video_variant=video_variant,
|
||||
cancel_check=cancel_check,
|
||||
)
|
||||
agent.sink = WebEventSink(broker, task_id)
|
||||
agent.cancel_check = lambda tid=task_id: broker.is_cancelled(tid)
|
||||
agent.run(user_message)
|
||||
sync_task_tokens(task_state)
|
||||
# cancel 命中或正常完成 → run_status 回 idle(error 才持久)
|
||||
|
|
@ -365,6 +370,36 @@ def _resolve_image_model(variant: str) -> str:
|
|||
return name
|
||||
|
||||
|
||||
def _list_video_variants() -> list[tuple[str, dict]]:
|
||||
"""扫 config/media/doubao.yaml video 段 → [(variant_key, variant_cfg), ...]。
|
||||
|
||||
与 _list_image_variants 同范式;空 video 段(未上线 / 注释掉)→ 返 [],UI 隐藏下拉。
|
||||
"""
|
||||
from core.paths import ROOT
|
||||
import yaml as _yaml
|
||||
|
||||
p = ROOT / "config" / "media" / "doubao.yaml"
|
||||
if not p.exists():
|
||||
return []
|
||||
try:
|
||||
data = _yaml.safe_load(p.read_text(encoding="utf-8")) or {}
|
||||
except Exception:
|
||||
return []
|
||||
video_cfg = data.get("video") or {}
|
||||
return [(k, v) for k, v in video_cfg.items() if isinstance(v, dict)]
|
||||
|
||||
|
||||
def _resolve_video_model(variant: str) -> str:
|
||||
"""校验 video_model variant key(同 _resolve_image_model 范式)。"""
|
||||
name = (variant or "").strip()
|
||||
if not name:
|
||||
return ""
|
||||
variants = {k for k, _ in _list_video_variants()}
|
||||
if name not in variants:
|
||||
raise HTTPException(400, f"invalid video_model {name!r}; available: {sorted(variants)}")
|
||||
return name
|
||||
|
||||
|
||||
# ────────────────────── Pydantic 请求体 ──────────────────────
|
||||
|
||||
class TaskCreateRequest(BaseModel):
|
||||
|
|
@ -385,18 +420,19 @@ class TaskPatchRequest(BaseModel):
|
|||
|
||||
class MessageRequest(BaseModel):
|
||||
content: str
|
||||
# 该条消息触发的生图模型 variant key(config/media/doubao.yaml image 段)。
|
||||
# 空 → seedream tool 走 yaml 第一个 variant(目前 seedream_5);非空 → 本次 run
|
||||
# 用此 variant 装配 SeedreamTool。仅作用于本 run,不入 DB,UI 下拉的选择跟在
|
||||
# 消息 POST body 上。
|
||||
# 该条消息触发的生图 / 生视频模型 variant key(config/media/doubao.yaml image/video 段)。
|
||||
# 空 → 对应 tool 走 yaml 第一个 variant;非空 → 本次 run 装配指定 variant。
|
||||
# 仅作用于本 run,不入 DB,UI 下拉的选择跟在消息 POST body 上。
|
||||
image_model: str = ""
|
||||
video_model: str = ""
|
||||
|
||||
|
||||
class OptimizePromptRequest(BaseModel):
|
||||
text: str
|
||||
# 选择性传当前 UI 选中的生图 variant key,润色 meta-prompt 会把对应模型特性塞进去
|
||||
# (让 LLM 知道下游 tool 偏好,润色出更贴合 seedream/seedance 等的 prompt)。
|
||||
# 选择性传当前 UI 选中的 variant key,润色 meta-prompt 会把对应模型特性塞进去
|
||||
# (让 LLM 知道下游 tool 偏好,润色出更贴合 seedream / seedance 等的 prompt)。
|
||||
image_model: str = ""
|
||||
video_model: str = ""
|
||||
|
||||
|
||||
class FileDeleteRequest(BaseModel):
|
||||
|
|
@ -542,7 +578,7 @@ def create_app() -> FastAPI:
|
|||
|
||||
前端顶栏第二个下拉拉这个;空列表 → 没配 image variant,UI 隐藏下拉。
|
||||
`is_default` 标第一个 variant(=agent_builder fallback 目标)。开发期不缓存,
|
||||
改 YAML 加新 variant(如 seedance)立即生效。
|
||||
改 YAML 加新 variant 立即生效。
|
||||
"""
|
||||
variants = _list_image_variants()
|
||||
out: list[dict] = []
|
||||
|
|
@ -556,6 +592,29 @@ def create_app() -> FastAPI:
|
|||
})
|
||||
return {"models": out}
|
||||
|
||||
@app.get("/v1/video_models", tags=["misc"])
|
||||
def list_video_models(user_id: UUID = Depends(require_user)):
|
||||
"""视频生成模型清单(扫 config/media/doubao.yaml video 段)。
|
||||
|
||||
与 /v1/image_models 同范式;空列表 → UI 隐藏第三下拉。展示信息包括默认分辨率
|
||||
与 token 单价(¥/Mtok 文生视频路径),方便用户在下拉选项里直接看到 cost 量级。
|
||||
"""
|
||||
variants = _list_video_variants()
|
||||
out: list[dict] = []
|
||||
for i, (key, cfg) in enumerate(variants):
|
||||
out.append({
|
||||
"variant": key,
|
||||
"display_name": cfg.get("display_name") or key,
|
||||
"model_id": cfg.get("model_id") or "",
|
||||
"default_resolution": cfg.get("default_resolution"),
|
||||
"default_duration": cfg.get("default_duration"),
|
||||
"default_ratio": cfg.get("default_ratio"),
|
||||
"price_cny_per_mtoken_text2video": cfg.get("price_cny_per_mtoken_text2video"),
|
||||
"price_cny_per_mtoken_video2video": cfg.get("price_cny_per_mtoken_video2video"),
|
||||
"is_default": i == 0,
|
||||
})
|
||||
return {"models": out}
|
||||
|
||||
# ───────────── Auth ─────────────
|
||||
|
||||
@app.post("/v1/auth/login", tags=["auth"])
|
||||
|
|
@ -1003,12 +1062,13 @@ def create_app() -> FastAPI:
|
|||
run_status="running", run_error=None,
|
||||
)
|
||||
)
|
||||
# image_model 在 POST 时校验,避免 BG 线程里抛在 sink 之外难追;空串透传不查 yaml。
|
||||
# image_model / video_model 在 POST 时校验,避免 BG 线程里抛在 sink 之外难追;空串透传不查 yaml。
|
||||
image_variant = _resolve_image_model(body.image_model)
|
||||
video_variant = _resolve_video_model(body.video_model)
|
||||
broker.start(tid) # 清上一轮 done 标记,新订阅者才能看到流式
|
||||
# commit 后 lock 释放;BG 线程接管(sink 通过 broker 把 event 桥回 asyncio loop)
|
||||
asyncio.create_task(asyncio.to_thread(
|
||||
_run_agent_bg, tid, user_id, content, image_variant,
|
||||
_run_agent_bg, tid, user_id, content, image_variant, video_variant,
|
||||
))
|
||||
return {"events_url": f"/v1/tasks/{tid}/events"}
|
||||
|
||||
|
|
@ -1149,7 +1209,7 @@ def create_app() -> FastAPI:
|
|||
except (FileNotFoundError, ValueError) as e:
|
||||
raise HTTPException(500, f"invalid task model_profile {chosen_profile!r}: {e}")
|
||||
|
||||
# 收集下游 tool 上下文:对话模型 display_name + 当前选中生图 variant 元数据
|
||||
# 收集下游 tool 上下文:对话模型 display_name + 当前选中 image/video variant 元数据
|
||||
chat_model_display = caps.display_name or chosen_profile
|
||||
image_variant_hint = ""
|
||||
img_variant = (body.image_model or "").strip()
|
||||
|
|
@ -1163,10 +1223,23 @@ def create_app() -> FastAPI:
|
|||
f"擅长写实/插画/构图描述)。若用户意图涉及画面/封面/插图,"
|
||||
f"润色后的文本要给出适合该模型的画面细节(主体/风格/光线/构图)。"
|
||||
)
|
||||
video_variant_hint = ""
|
||||
vid_variant = (body.video_model or "").strip()
|
||||
if vid_variant:
|
||||
for k, v in _list_video_variants():
|
||||
if k == vid_variant:
|
||||
name = v.get("display_name") or k
|
||||
res = v.get("default_resolution") or "720p"
|
||||
dur = v.get("default_duration") or 5
|
||||
video_variant_hint = (
|
||||
f"\n下游生视频工具:{name}(默认 {res} / {dur}s / 16:9)。"
|
||||
f"若用户意图涉及视频/动画/动起来,润色后的文本要补全 "
|
||||
f"主体在做什么(运动)+ 镜头怎么动 + 场景 + 风格,而非静态画面描述。"
|
||||
)
|
||||
|
||||
meta_prompt = (
|
||||
f"你的任务是润色用户输入的草稿,使之成为一个清晰、完整、可执行的 prompt。\n"
|
||||
f"当前对话模型:{chat_model_display}。{image_variant_hint}\n\n"
|
||||
f"当前对话模型:{chat_model_display}。{image_variant_hint}{video_variant_hint}\n\n"
|
||||
f"规则:\n"
|
||||
f"1. 只输出润色后的文本本身,不要任何解释、前后缀、引号、markdown 代码块包裹\n"
|
||||
f"2. 保留用户原始语言(中文/英文)\n"
|
||||
|
|
@ -1215,6 +1288,7 @@ def create_app() -> FastAPI:
|
|||
"tokens_out": completion_tokens,
|
||||
"usd_to_cny": float(USD_TO_CNY),
|
||||
"image_model_hint": img_variant or "",
|
||||
"video_model_hint": vid_variant or "",
|
||||
},
|
||||
cost_cny=cost_cny,
|
||||
))
|
||||
|
|
|
|||
|
|
@ -785,6 +785,9 @@ const state = {
|
|||
// 当前选中的图像生成 variant key(per-session,不入 DB);默认 = imageModels[0].variant
|
||||
// (=yaml 第一个 = agent_builder fallback)。下次 send 消息时随 POST body 带给 backend。
|
||||
imageModel: "",
|
||||
// 视频生成模型清单(GET /v1/video_models)+ 当前选中 variant。同 imageModels 范式。
|
||||
videoModels: [],
|
||||
videoModel: "",
|
||||
// 润色按钮进行中标记:防止双击,同时让 syncOptimizeBtn 在 in-flight 期间不覆盖
|
||||
// disabled 状态(否则用户键入 input 会把按钮从"润色中"误启回 enabled)
|
||||
optimizing: false,
|
||||
|
|
@ -1138,6 +1141,17 @@ async function loadModels() {
|
|||
state.imageModels = [];
|
||||
state.imageModel = "";
|
||||
}
|
||||
try {
|
||||
const data = await api("GET", "/v1/video_models");
|
||||
state.videoModels = data.models || [];
|
||||
if (!state.videoModel) {
|
||||
const def = state.videoModels.find(m => m.is_default) || state.videoModels[0];
|
||||
state.videoModel = def ? def.variant : "";
|
||||
}
|
||||
} catch (e) {
|
||||
state.videoModels = [];
|
||||
state.videoModel = "";
|
||||
}
|
||||
}
|
||||
|
||||
// loadTaskList:默认 reset(filters/refresh/写操作后),append=true 由 sentinel observer 触发
|
||||
|
|
@ -1380,11 +1394,14 @@ function renderChatMeta() {
|
|||
<span class="spacer"></span>
|
||||
${renderModelDropdown(t)}
|
||||
${renderImageModelDropdown()}
|
||||
${renderVideoModelDropdown()}
|
||||
`;
|
||||
const sel = $("chat-model-sel");
|
||||
if (sel) sel.onchange = onChangeModel;
|
||||
const imgSel = $("chat-image-model-sel");
|
||||
if (imgSel) imgSel.onchange = onChangeImageModel;
|
||||
const vidSel = $("chat-video-model-sel");
|
||||
if (vidSel) vidSel.onchange = onChangeVideoModel;
|
||||
const active = t.status === "active";
|
||||
$("chat-form").style.display = active ? "flex" : "none";
|
||||
syncOptimizeBtn();
|
||||
|
|
@ -1425,6 +1442,22 @@ function onChangeImageModel(ev) {
|
|||
$("chat-hint").textContent = `生图模型 → ${ev.target.options[ev.target.selectedIndex].text}`;
|
||||
}
|
||||
|
||||
function renderVideoModelDropdown() {
|
||||
// 同 renderImageModelDropdown:videoModels 为空 → 不画。yaml 无 video 段 / 后端
|
||||
// /v1/video_models 返空时下拉不出现,seedance tool 也不会在 schema 里。
|
||||
if (!state.videoModels || state.videoModels.length === 0) return "";
|
||||
const cur = state.videoModel || "";
|
||||
const opts = state.videoModels.map(m =>
|
||||
`<option value="${escapeHtml(m.variant)}" ${m.variant === cur ? "selected" : ""}>${escapeHtml(m.display_name)}</option>`
|
||||
).join("");
|
||||
return `<span class="muted small" style="display:inline-flex;align-items:center;gap:4px;">生视频 <select id="chat-video-model-sel" class="small" style="width:auto;padding:1px 4px;font-size:12px;" title="下一条消息触发生视频时使用的模型(本地选择,不入库)">${opts}</select></span>`;
|
||||
}
|
||||
|
||||
function onChangeVideoModel(ev) {
|
||||
state.videoModel = ev.target.value || "";
|
||||
$("chat-hint").textContent = `生视频模型 → ${ev.target.options[ev.target.selectedIndex].text}`;
|
||||
}
|
||||
|
||||
async function onChangeModel(ev) {
|
||||
const sel = ev.target;
|
||||
const newProfile = sel.value;
|
||||
|
|
@ -1607,6 +1640,7 @@ async function optimizePrompt() {
|
|||
const r = await api("POST", `/v1/tasks/${state.taskId}/optimize_prompt`, {
|
||||
text: ta.value, // 不 trim — 后端再 strip;保留尾部 newline 让用户感受不变
|
||||
image_model: state.imageModel || "",
|
||||
video_model: state.videoModel || "",
|
||||
});
|
||||
const optimized = (r.optimized || "").trim();
|
||||
if (!optimized) throw new Error("空结果");
|
||||
|
|
@ -1677,6 +1711,7 @@ async function sendMessage() {
|
|||
const r = await api("POST", `/v1/tasks/${state.taskId}/messages`, {
|
||||
content,
|
||||
image_model: state.imageModel || "",
|
||||
video_model: state.videoModel || "",
|
||||
});
|
||||
$("chat-input").value = "";
|
||||
syncOptimizeBtn();
|
||||
|
|
|
|||
Loading…
Reference in New Issue