ui+api: dev SPA SSE 客户端 3 次退避重连 + stream_events 非活跃 task 立即吐 done
--reload 重启 / 网络抖时 fetchSse 拆出 consumeSseStream + 包重连壳 (1s/2s/4s,EOF 未见 done/error 触发重连);后端 stream_events 入口检 tasks.run_status,非 running/cancelling 直接关流,避免重连卡在空 broker 无限挂 ping。3 次仍失败 → 卡片末尾红色"请重发"。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
c5dcbb7e24
commit
52c25b9404
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。
|
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。
|
||||||
|
|
||||||
最后更新:2026-05-21(research skill 三次迭代:fetch_pdf 改走静态直链跟 fetch_xml 对齐 → 5/5 fetch 100%)
|
最后更新:2026-05-21(dev SPA SSE 客户端重连 + 后端 stream_events 非活跃态立即吐 done)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -23,6 +23,7 @@
|
||||||
|
|
||||||
### 2026-05-21
|
### 2026-05-21
|
||||||
|
|
||||||
|
- **dev SPA SSE 客户端重连(覆盖 --reload 抖动)**:`fetchSse` 拆出 `consumeSseStream` + 包重连壳(1s/2s/4s 退避,最多 3 次);reader EOF 未见 done/error 算异常关流触发重连;后端 `stream_events` 入口检 `tasks.run_status`,非 running/cancelling 立即吐 done 关流(否则进程重启后新 broker 内存空,客户端会无限挂 ping)。3 次仍失败 → 卡片末尾红色"连接已断开,请重发"。断开期间 LLM delta 丢失,接受。
|
||||||
- **research skill 三次迭代 fetch_pdf 改走静态直链**:`fetch_pdf` 跟 `fetch_xml` 同范式,从 `paper["pdf_url"]` 流式下载,绕开 paper_pdf_view 路径 bug(disk 路径计算错);smoke 5/5 PASS。
|
- **research skill 三次迭代 fetch_pdf 改走静态直链**:`fetch_pdf` 跟 `fetch_xml` 同范式,从 `paper["pdf_url"]` 流式下载,绕开 paper_pdf_view 路径 bug(disk 路径计算错);smoke 5/5 PASS。
|
||||||
- **research skill 二次迭代 list 端点加 pdf_url / xml_url 直链 + 新增 fetch_xml + pg_trgm GIN 索引**:serializer 后端拼直链(避免 LLM 拿 stale URL),`0006_pg_trgm` 给 title/first_author/institution 加 GIN 把 `?search=xxx` 从 30s timeout 降到几十 ms;SKILL.md 加"XML 优先 PDF"原则(XML 已结构化免 OCR)。
|
- **research skill 二次迭代 list 端点加 pdf_url / xml_url 直链 + 新增 fetch_xml + pg_trgm GIN 索引**:serializer 后端拼直链(避免 LLM 拿 stale URL),`0006_pg_trgm` 给 title/first_author/institution 加 GIN 把 `?search=xxx` 从 30s timeout 降到几十 ms;SKILL.md 加"XML 优先 PDF"原则(XML 已结构化免 OCR)。
|
||||||
- **顶栏 token 累计修(sync_task_tokens 改走 messages SUM)**:5/20 切 streaming 后 `LLM.TokenCounter` 内存计数器永不更新;删 TokenCounter 整个类,`sync_task_tokens` 改 `SELECT SUM(tokens_in/out) FROM messages WHERE task_id=?` 现算;backfill 4 个 task。
|
- **顶栏 token 累计修(sync_task_tokens 改走 messages SUM)**:5/20 切 streaming 后 `LLM.TokenCounter` 内存计数器永不更新;删 TokenCounter 整个类,`sync_task_tokens` 改 `SELECT SUM(tokens_in/out) FROM messages WHERE task_id=?` 现算;backfill 4 个 task。
|
||||||
|
|
|
||||||
5
RUN.md
5
RUN.md
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
> 怎么把 zcbot 跑起来。env / 常用命令 / 故障兜底。设计看 `DESIGN.md`,进度看 `PROGRESS.md`。
|
> 怎么把 zcbot 跑起来。env / 常用命令 / 故障兜底。设计看 `DESIGN.md`,进度看 `PROGRESS.md`。
|
||||||
|
|
||||||
最后更新:2026-05-20(加 `POST /v1/tasks/{id}/optimize_prompt` 辅助 LLM 草稿润色;`usage_events.kind` 新增 `prompt_optimize`)
|
最后更新:2026-05-21(dev SPA SSE 客户端重连 3 次退避;`/v1/tasks/{id}/events` 非活跃 task 立即吐 done)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -206,7 +206,7 @@ sudo systemctl stop zcbot
|
||||||
zcbot 现在 5 人级 + SSE 长连接,**严格"零中断"**(蓝绿 + nginx + SSE 客户端 reconnect 设计)代价高,不值得。有性价比的两挡:
|
zcbot 现在 5 人级 + SSE 长连接,**严格"零中断"**(蓝绿 + nginx + SSE 客户端 reconnect 设计)代价高,不值得。有性价比的两挡:
|
||||||
|
|
||||||
**A. 简易档:`--reload`**(推荐当前规模)
|
**A. 简易档:`--reload`**(推荐当前规模)
|
||||||
ExecStart 加 `--reload`,`git pull` 后 uvicorn 监听到文件变动自动重起子进程,REST 抖动 < 1s。**代价**:SSE 连接被切断(浏览器看到 "load failed",dev.html 自动跳登录页或同事重发一次消息;DB 里被切的 task 走启动 reaper 标 `run_status=error`)。
|
ExecStart 加 `--reload`,`git pull` 后 uvicorn 监听到文件变动自动重起子进程,REST 抖动 < 1s。**代价**:SSE 连接被切断,**dev.html 自动重连 3 次(1s/2s/4s 退避)**;若新进程已被启动 reaper 标 `run_status=error`,重连立即收 done,卡片末尾追加红色"连接已断开,请重发"。期间 LLM 吐的 delta 丢失(broker 不持久化 event,接受)。3 次仍失败 → 同上提示,用户重发即可。
|
||||||
|
|
||||||
```ini
|
```ini
|
||||||
ExecStart=/opt/zcbot/.venv/bin/python main.py web --host 0.0.0.0 --port 8765 --reload
|
ExecStart=/opt/zcbot/.venv/bin/python main.py web --host 0.0.0.0 --port 8765 --reload
|
||||||
|
|
@ -275,6 +275,7 @@ sudo journalctl -u zcbot -n 50 # 看新进程起没起干
|
||||||
| `/v1/*` 返 401 `token expired` | JWT 默 7d TTL 到期,重 login。要更长改 `ZCBOT_JWT_TTL_SECONDS` env |
|
| `/v1/*` 返 401 `token expired` | JWT 默 7d TTL 到期,重 login。要更长改 `ZCBOT_JWT_TTL_SECONDS` env |
|
||||||
| dev.html SSE 收不到流(消息发出去 UI 没动) | EventSource 不支持 header,dev.html 走 `fetch + ReadableStream`。devtools Network 看 POST /messages 是否 202 + events_url GET 是否 200 + Content-Type 是 text/event-stream;401 → token 过期,logout 重 login |
|
| dev.html SSE 收不到流(消息发出去 UI 没动) | EventSource 不支持 header,dev.html 走 `fetch + ReadableStream`。devtools Network 看 POST /messages 是否 202 + events_url GET 是否 200 + Content-Type 是 text/event-stream;401 → token 过期,logout 重 login |
|
||||||
| dev.html 显示 "load failed" 立刻回登录页 | token 过期或 JWT_SECRET 服务端变了。已自动跳登录页,按上次 tab 重登 |
|
| dev.html 显示 "load failed" 立刻回登录页 | token 过期或 JWT_SECRET 服务端变了。已自动跳登录页,按上次 tab 重登 |
|
||||||
|
| dev.html 顶栏出现"连接断开,重连中…(N/3)" | SSE 流被切(`--reload` 重启 / nginx 切换 / 网络抖)。客户端自动重连,1s/2s/4s 退避;新进程已 reaper 标 error 则立即收 done + 卡片末尾"请重发"提示;若服务端还活着会继续看后续 delta(断开期间的丢失,broker 不持久化) |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
|
||||||
13
web/app.py
13
web/app.py
|
|
@ -1208,12 +1208,21 @@ def create_app() -> FastAPI:
|
||||||
raise HTTPException(404, f"invalid task id: {task_id!r}")
|
raise HTTPException(404, f"invalid task id: {task_id!r}")
|
||||||
with session_scope() as s:
|
with session_scope() as s:
|
||||||
_assert_owns_task(s, tid, user_id)
|
_assert_owns_task(s, tid, user_id)
|
||||||
|
run_status = s.execute(
|
||||||
|
select(Task.run_status).where(Task.task_id == tid)
|
||||||
|
).scalar_one()
|
||||||
|
|
||||||
|
# 重连保护:若 task 不在活跃态(进程重启 / reaper 已收尾 / 自然结束),
|
||||||
|
# 直接吐 done 关流。否则 broker 进程内队列空,客户端会无限挂在 ping 上。
|
||||||
|
is_active = run_status in ("running", "cancelling")
|
||||||
|
|
||||||
async def gen():
|
async def gen():
|
||||||
|
yield b": connected\nretry: 3000\n\n"
|
||||||
|
if not is_active:
|
||||||
|
yield _sse_event("done", {})
|
||||||
|
return
|
||||||
q = broker.subscribe(tid)
|
q = broker.subscribe(tid)
|
||||||
try:
|
try:
|
||||||
# 第一帧 retry 注释 + 心跳:让 EventSource 立即建立(不被 buffer 卡)
|
|
||||||
yield b": connected\nretry: 3000\n\n"
|
|
||||||
while True:
|
while True:
|
||||||
try:
|
try:
|
||||||
ev = await asyncio.wait_for(q.get(), timeout=30.0)
|
ev = await asyncio.wait_for(q.get(), timeout=30.0)
|
||||||
|
|
|
||||||
|
|
@ -1668,37 +1668,45 @@ function streamSse(url, asstCard) {
|
||||||
|
|
||||||
async function fetchSse(url, asstCard) {
|
async function fetchSse(url, asstCard) {
|
||||||
const body = asstCard.querySelector(".body");
|
const body = asstCard.querySelector(".body");
|
||||||
const ctx = { acc: "", body, pending: false, seenRels: new Set() };
|
const ctx = { acc: "", body, pending: false, seenRels: new Set(), terminal: false };
|
||||||
|
const hint = $("chat-hint");
|
||||||
|
// 重连:reader 异常 / 自然 EOF 但未收到 done/error 时,GET events 重订阅。
|
||||||
|
// 后端 stream_events 重连入口会校验 run_status,task 已不活跃直接吐 done 关流;
|
||||||
|
// 这里 3 次失败再放弃,覆盖 systemctl restart 的 1~2s 抖动 + reaper 跑完的窗口。
|
||||||
|
const backoffs = [1000, 2000, 4000];
|
||||||
|
let attempt = 0;
|
||||||
try {
|
try {
|
||||||
const r = await fetch(url, {
|
|
||||||
headers: { "Authorization": "Bearer " + state.token, "Accept": "text/event-stream" },
|
|
||||||
});
|
|
||||||
if (!r.ok) throw new Error(r.status + " " + r.statusText);
|
|
||||||
const reader = r.body.getReader();
|
|
||||||
const dec = new TextDecoder();
|
|
||||||
let buf = "";
|
|
||||||
$("chat-hint").textContent = "接收中…";
|
|
||||||
while (true) {
|
while (true) {
|
||||||
const { value, done } = await reader.read();
|
try {
|
||||||
if (done) break;
|
await consumeSseStream(url, asstCard, ctx);
|
||||||
buf += dec.decode(value, { stream: true });
|
} catch (e) {
|
||||||
while (true) {
|
if (ctx.terminal) break; // 已收到 done/error,不重连
|
||||||
const idx = buf.indexOf("\n\n");
|
if (attempt >= backoffs.length) {
|
||||||
if (idx < 0) break;
|
appendErrorCard("连接已断开,刚才的回复可能未完成,请重发消息。");
|
||||||
const frame = buf.slice(0, idx);
|
break;
|
||||||
buf = buf.slice(idx + 2);
|
}
|
||||||
const ev = parseSseFrame(frame);
|
hint.textContent = `连接断开,重连中…(${attempt + 1}/${backoffs.length})`;
|
||||||
if (!ev) continue;
|
await new Promise(r => setTimeout(r, backoffs[attempt]));
|
||||||
handleSseEvent(ev, asstCard, ctx);
|
attempt++;
|
||||||
if (ev.event === "done" || ev.event === "error") break;
|
continue;
|
||||||
}
|
}
|
||||||
|
// consumeSseStream 正常返回:reader 收到 EOF
|
||||||
|
if (ctx.terminal) break; // 正常收尾(看到 done/error)
|
||||||
|
// 未见 done/error 就 EOF → 服务端中途关流(进程被杀 / nginx 切),重连
|
||||||
|
if (attempt >= backoffs.length) {
|
||||||
|
appendErrorCard("连接已断开,刚才的回复可能未完成,请重发消息。");
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
hint.textContent = `连接断开,重连中…(${attempt + 1}/${backoffs.length})`;
|
||||||
|
await new Promise(r => setTimeout(r, backoffs[attempt]));
|
||||||
|
attempt++;
|
||||||
}
|
}
|
||||||
// 最终定稿 + 代码高亮(流式中不 highlight,省 CPU)
|
// 最终定稿 + 代码高亮(流式中不 highlight,省 CPU)
|
||||||
body.innerHTML = renderMd(ctx.acc);
|
body.innerHTML = renderMd(ctx.acc);
|
||||||
highlightIn(asstCard);
|
highlightIn(asstCard);
|
||||||
} finally {
|
} finally {
|
||||||
body.classList.remove("streaming");
|
body.classList.remove("streaming");
|
||||||
$("chat-hint").textContent = "就绪";
|
hint.textContent = "就绪";
|
||||||
state.streaming = false;
|
state.streaming = false;
|
||||||
setActionMode("idle");
|
setActionMode("idle");
|
||||||
}
|
}
|
||||||
|
|
@ -1709,6 +1717,35 @@ async function fetchSse(url, asstCard) {
|
||||||
refreshConcurrentWarnings(); // 自己 task 收尾,顺便清/更新 banner(同 wd 邻居可能也变了)
|
refreshConcurrentWarnings(); // 自己 task 收尾,顺便清/更新 banner(同 wd 邻居可能也变了)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function consumeSseStream(url, asstCard, ctx) {
|
||||||
|
const r = await fetch(url, {
|
||||||
|
headers: { "Authorization": "Bearer " + state.token, "Accept": "text/event-stream" },
|
||||||
|
});
|
||||||
|
if (!r.ok) throw new Error(r.status + " " + r.statusText);
|
||||||
|
const reader = r.body.getReader();
|
||||||
|
const dec = new TextDecoder();
|
||||||
|
let buf = "";
|
||||||
|
$("chat-hint").textContent = "接收中…";
|
||||||
|
while (true) {
|
||||||
|
const { value, done } = await reader.read();
|
||||||
|
if (done) return;
|
||||||
|
buf += dec.decode(value, { stream: true });
|
||||||
|
while (true) {
|
||||||
|
const idx = buf.indexOf("\n\n");
|
||||||
|
if (idx < 0) break;
|
||||||
|
const frame = buf.slice(0, idx);
|
||||||
|
buf = buf.slice(idx + 2);
|
||||||
|
const ev = parseSseFrame(frame);
|
||||||
|
if (!ev) continue;
|
||||||
|
handleSseEvent(ev, asstCard, ctx);
|
||||||
|
if (ev.event === "done" || ev.event === "error") {
|
||||||
|
ctx.terminal = true;
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
function parseSseFrame(frame) {
|
function parseSseFrame(frame) {
|
||||||
const lines = frame.split("\n");
|
const lines = frame.split("\n");
|
||||||
let event = "msg"; let dataLines = [];
|
let event = "msg"; let dataLines = [];
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue