feat(preview): pptx 在线预览 —— LibreOffice→PDF + 复用 PDF iframe(DESIGN §8.3 Stage 1)
文件区点 .pptx 不再只能下载。后端转 PDF,前端复用现成 PDF iframe。 - web/pptx_render.py: pptx_to_pdf() 调 soffice,独立临时 profile 绕单 profile 锁、60s 超时 kill;缓存 .preview/<stem>.<hash>.pdf(hash=mtime+size,源改即 失效,prune 旧 hash);soffice 缺失抛 SofficeNotFoundError - web/app.py: GET /v1/files/preview_pdf —— _safe_join 防穿越 + 仅 .ppt(x) + per-path asyncio.Lock 防并发重转 + run_in_executor 不堵事件循环;缺失 501/失败 500 - preview.js: ppt 组 + main/mini 共用 _showPptAsPdf(spinner loading + 失败回退下载) - dev.html: .preview-spinner(复用 @keyframes spin) - 转换跑 web host 进程不进沙盒;部署 host 装 libreoffice-impress + fonts-noto-cjk (sandbox Dockerfile 不动) - tests/test_pptx_render.py: 10 例(缓存命中跳 soffice/源变失效+prune/缺失降级/越界拒绝) - 文档:RUN.md(host 装 + 故障兜底 2 行)、PROGRESS.md Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
087980d027
commit
b70b993257
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。
|
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。
|
||||||
|
|
||||||
最后更新:2026-06-08(进度还原修复:停压 task_progress 参数 + 进度区移到对话区顶部 + 完成态折叠)
|
最后更新:2026-06-09(PPTX 前端在线预览:LibreOffice→PDF + 复用 PDF iframe,DESIGN §8.3 Stage 1)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -23,6 +23,7 @@
|
||||||
|
|
||||||
### 2026-06-09
|
### 2026-06-09
|
||||||
|
|
||||||
|
- **PPTX 前端在线预览(LibreOffice→PDF,DESIGN §8.3 Stage 1)**:此前文件区点 `.pptx` 只能下载(`preview.js._categorize` 归 fallback)。关键洞察=前端已有 PDF iframe 路径(`_showPdf`),所以只要后端把 pptx 转 PDF 就**前端几乎不动**。落地:① 新 `web/pptx_render.py`——`pptx_to_pdf()` 同步可缓存,调 `soffice --headless --convert-to pdf`、**每次独立 `-env:UserInstallation` 临时 profile** 绕单 profile 锁、超时 60s kill;soffice 路径发现复用 render_bg 思路;缓存落源同目录 `.preview/<stem>.<hash>.pdf`(hash=mtime+size,源改即失效;dotdir 不污染文件列表),`_prune_stale` 清旧 hash。② 新端点 `GET /v1/files/preview_pdf`——复用 `_safe_join` 鉴权防穿越 + 仅 `.ppt(x)` + per-path `asyncio.Lock` 防并发重转 + `run_in_executor` 不堵事件循环;soffice 缺失 501 / 转换失败 500。③ `preview.js` 加 `ppt` 组,main/mini 共用 `_showPptAsPdf`(fetch PDF→iframe,带 spinner loading + 失败回退下载),`dev.html` 加 `.preview-spinner`(复用 `@keyframes spin`)。**转换跑在 web host 进程,不进沙盒**(沙盒不该有 LibreOffice;预览面向 user_root 任意 pptx,与 deck 生成解耦)。部署:host `apt install libreoffice-impress fonts-noto-cjk`(已写进 RUN.md 一次性 + 故障兜底),sandbox Dockerfile 不动。**未做**(Stage 2):常驻 soffice listener 消冷启、deck 生成后 eager 预转、缩略图导航。
|
||||||
- **药3 复核:`/home/ubuntu/zcbot` 幽灵路径已于 06-03 修复,新任务不复现 + 加回归测试钉死**:接续高轮数烧 token 诊断收尾。证据链——失败 task `ab063233`(06-02 03:54)/`ff1686b7`(06-03 12:02)的首条 assistant 消息(idx1)tool_calls 里就带 `glob(path=/home/ubuntu/zcbot/workspace/users/<真实uid>/数据资源展示)`,**带真 uid 只可能来自当时的 system prompt**(messages 表无 system-role 行,系统提示运行时拼接不入库);两 task 均建于 06-03「system prompt 焊死宿主路径」修复前后。fs 工具在 docker 容器里跑(容器无此宿主路径)→ `[Error] base path not found`(`glob.execute` 对不存在 base 返此错,`_display` 对 user_root 外路径回绝对)→ 重试风暴(实测 51 次)。**复核当前代码**:docker 模式即便传 `tool_base=/home/ubuntu/zcbot` + 真 uid,拼出的 prompt 只含 `/workspace/<wd>`、不含宿主路径/uid/tmp(`agent_builder.py:223-250` docker 分支注入容器路径 + 删 cwd 行);prod 走 docker backend(RUN.md)。新增 `tests/test_system_prompt_paths.py`(2 例:docker 无宿主泄漏 + host 保留本地绝对路径,过)锁住修复防回归。**三味药全部收口**;药1 重复守卫此后还兜底任何同类风暴(同一 51 次会被摁到 ~5 次)。无功能代码改动,仅加测试。
|
- **药3 复核:`/home/ubuntu/zcbot` 幽灵路径已于 06-03 修复,新任务不复现 + 加回归测试钉死**:接续高轮数烧 token 诊断收尾。证据链——失败 task `ab063233`(06-02 03:54)/`ff1686b7`(06-03 12:02)的首条 assistant 消息(idx1)tool_calls 里就带 `glob(path=/home/ubuntu/zcbot/workspace/users/<真实uid>/数据资源展示)`,**带真 uid 只可能来自当时的 system prompt**(messages 表无 system-role 行,系统提示运行时拼接不入库);两 task 均建于 06-03「system prompt 焊死宿主路径」修复前后。fs 工具在 docker 容器里跑(容器无此宿主路径)→ `[Error] base path not found`(`glob.execute` 对不存在 base 返此错,`_display` 对 user_root 外路径回绝对)→ 重试风暴(实测 51 次)。**复核当前代码**:docker 模式即便传 `tool_base=/home/ubuntu/zcbot` + 真 uid,拼出的 prompt 只含 `/workspace/<wd>`、不含宿主路径/uid/tmp(`agent_builder.py:223-250` docker 分支注入容器路径 + 删 cwd 行);prod 走 docker backend(RUN.md)。新增 `tests/test_system_prompt_paths.py`(2 例:docker 无宿主泄漏 + host 保留本地绝对路径,过)锁住修复防回归。**三味药全部收口**;药1 重复守卫此后还兜底任何同类风暴(同一 51 次会被摁到 ~5 次)。无功能代码改动,仅加测试。
|
||||||
- **ppt skill 补「信息设计纪律」+ 混合背景 + pptx 预览器(治"效果还是不太行",深读 pptmaster 后的二次修正)**:用户反馈卡片式 v2 仍不够好,拆其真实产物(`大模型与智能体介绍.pptx`)定位毛病=9 页 4 页雷同卡片网格(全卡=AI 味)、发展历程做成网格(该时间轴)、智能体平铺(该闭环)、图标 0.6 寸太小、投影到处加。**深读 pptmaster 的 executor-base/executor-consultant(-top)/shared-standards 后顿悟**:它像麦肯锡的真因是**信息设计纪律(~70%)**而非 SVG 渲染(~30%),而这些**全是 editable python-pptx 能做的**——之前纠结的"可编辑 vs SVG 转换器"搞错了轴(可编辑都落 DrawingML 同一天花板,转换器零视觉增益)。落地三层:① **信息内功**——`add_takeaway`(论断标题下一句话结论框)、`add_kpi` 加 `baseline+delta`(数据语境化:数字带对比基准+升降色 `GOOD/BAD`)、`add_source`(来源)、`add_toc`(贯通整宽目录);SKILL 策略阶段加论断式标题对照表 + page_rhythm(anchor/dense/**breathing 强制打破卡片网格**)+ 内容→版式映射写进逐页大纲。② **修我搞反的投影**——pptmaster"投影是克制":`add_card` 默认 `shadow=False`(平铺对等卡描发丝边不投影)、每页 ≤2-3 投影、一容器一手段不叠;quality_check 加绿=语义状态色豁免三色制。③ **组合件 + 工具**——`add_card_grid`(均衡网格,2 行改图标左置治"图标顶置挤溢出")/`add_timeline`/`add_cycle`;`render_bg.py`(无头 Chrome 渲杂志级 mesh 渐变背景图,**混合方案**:背景图+原生可编辑白字,封面/章节);**`pptx_preview.py`(把 .pptx 渲成 PNG 肉眼验观感)——quality_check 只查结构,预览补"好不好看",当场抓到 `set_text` 多行只给第一段上色的真 bug(封面副标题第二行变暗看不见)并修复**。验证:重排「大模型与智能体」为 10 页(节奏:封面/目录/章节 anchor · 网格/时间轴 dense · 大字 breathing · 章节/闭环/网格 · 致谢),逐页渲 PNG 亲眼验收均专业,quality_check 全过。改 `skills/ppt/{SKILL.md,references/{design_principles,layouts}.md,scripts/{pptx_helpers,quality_check}.py}` + 新增 `scripts/{render_bg,pptx_preview}.py` + `SKILL_LIST.md`。**未动**:SVG→原生转换器(论证为零增益不做)、live preview server、动画;fetch_icon 的 PNG 后端(cairosvg/svglib)本机未装,暂用种子库 PNG。
|
- **ppt skill 补「信息设计纪律」+ 混合背景 + pptx 预览器(治"效果还是不太行",深读 pptmaster 后的二次修正)**:用户反馈卡片式 v2 仍不够好,拆其真实产物(`大模型与智能体介绍.pptx`)定位毛病=9 页 4 页雷同卡片网格(全卡=AI 味)、发展历程做成网格(该时间轴)、智能体平铺(该闭环)、图标 0.6 寸太小、投影到处加。**深读 pptmaster 的 executor-base/executor-consultant(-top)/shared-standards 后顿悟**:它像麦肯锡的真因是**信息设计纪律(~70%)**而非 SVG 渲染(~30%),而这些**全是 editable python-pptx 能做的**——之前纠结的"可编辑 vs SVG 转换器"搞错了轴(可编辑都落 DrawingML 同一天花板,转换器零视觉增益)。落地三层:① **信息内功**——`add_takeaway`(论断标题下一句话结论框)、`add_kpi` 加 `baseline+delta`(数据语境化:数字带对比基准+升降色 `GOOD/BAD`)、`add_source`(来源)、`add_toc`(贯通整宽目录);SKILL 策略阶段加论断式标题对照表 + page_rhythm(anchor/dense/**breathing 强制打破卡片网格**)+ 内容→版式映射写进逐页大纲。② **修我搞反的投影**——pptmaster"投影是克制":`add_card` 默认 `shadow=False`(平铺对等卡描发丝边不投影)、每页 ≤2-3 投影、一容器一手段不叠;quality_check 加绿=语义状态色豁免三色制。③ **组合件 + 工具**——`add_card_grid`(均衡网格,2 行改图标左置治"图标顶置挤溢出")/`add_timeline`/`add_cycle`;`render_bg.py`(无头 Chrome 渲杂志级 mesh 渐变背景图,**混合方案**:背景图+原生可编辑白字,封面/章节);**`pptx_preview.py`(把 .pptx 渲成 PNG 肉眼验观感)——quality_check 只查结构,预览补"好不好看",当场抓到 `set_text` 多行只给第一段上色的真 bug(封面副标题第二行变暗看不见)并修复**。验证:重排「大模型与智能体」为 10 页(节奏:封面/目录/章节 anchor · 网格/时间轴 dense · 大字 breathing · 章节/闭环/网格 · 致谢),逐页渲 PNG 亲眼验收均专业,quality_check 全过。改 `skills/ppt/{SKILL.md,references/{design_principles,layouts}.md,scripts/{pptx_helpers,quality_check}.py}` + 新增 `scripts/{render_bg,pptx_preview}.py` + `SKILL_LIST.md`。**未动**:SVG→原生转换器(论证为零增益不做)、live preview server、动画;fetch_icon 的 PNG 后端(cairosvg/svglib)本机未装,暂用种子库 PNG。
|
||||||
|
|
||||||
|
|
|
||||||
6
RUN.md
6
RUN.md
|
|
@ -184,6 +184,10 @@ sudo chown -R zcbot:zcbot /opt/zcbot
|
||||||
# 把 .env 权限收紧(含 JWT_SECRET / PLATFORM_KEY)
|
# 把 .env 权限收紧(含 JWT_SECRET / PLATFORM_KEY)
|
||||||
sudo chmod 600 /opt/zcbot/.env
|
sudo chmod 600 /opt/zcbot/.env
|
||||||
sudo chown zcbot:zcbot /opt/zcbot/.env
|
sudo chown zcbot:zcbot /opt/zcbot/.env
|
||||||
|
|
||||||
|
# PPTX 在线预览(DESIGN §8.3):web 进程(本 host,非 sandbox)调 soffice 把 .pptx 转 PDF。
|
||||||
|
# 装 LibreOffice Impress + 中文字体(缺则前端 .pptx 自动回退到"下载查看",不报错)。
|
||||||
|
sudo apt-get install -y --no-install-recommends libreoffice-impress fonts-noto-cjk
|
||||||
```
|
```
|
||||||
|
|
||||||
### unit 文件 `/etc/systemd/system/zcbot.service`
|
### unit 文件 `/etc/systemd/system/zcbot.service`
|
||||||
|
|
@ -680,6 +684,8 @@ sudo xfs_quota -x -c "limit -p bhard=10g zcbot_<user_uuid>" /opt
|
||||||
| `[startup] reaped N stale active run(s)` | 上次 web 进程未正常 finish 留下 N 个孤儿 run,启动 lifespan 自动标 error。info 级,无需处理 |
|
| `[startup] reaped N stale active run(s)` | 上次 web 进程未正常 finish 留下 N 个孤儿 run,启动 lifespan 自动标 error。info 级,无需处理 |
|
||||||
| `seedream` tool 没出现在对话里 | `.env` 没设 `ARK_API_KEY`,build_agent 跳过注册。设了重启 web 即可;无需迁移、无需 DB 改动 |
|
| `seedream` tool 没出现在对话里 | `.env` 没设 `ARK_API_KEY`,build_agent 跳过注册。设了重启 web 即可;无需迁移、无需 DB 改动 |
|
||||||
| `document_*` tool 没出现在对话里 | `.env` 没设 `DOCUMENT_SEARCH_API_KEY`,build_agent 跳过注册。设了重启 web 即可;key 不进入 sandbox。 |
|
| `document_*` tool 没出现在对话里 | `.env` 没设 `DOCUMENT_SEARCH_API_KEY`,build_agent 跳过注册。设了重启 web 即可;key 不进入 sandbox。 |
|
||||||
|
| 文件区点 `.pptx` 弹"服务器未装 LibreOffice"/ 直接回退下载 | web host(非 sandbox)没装 soffice。`sudo apt-get install -y --no-install-recommends libreoffice-impress fonts-noto-cjk` 后**重启 web**。dev(Windows)`winget install TheDocumentFoundation.LibreOffice`。验:`soffice --version` 或 `python -c "from web.pptx_render import find_soffice; print(find_soffice())"` |
|
||||||
|
| `.pptx` 预览首次慢几秒 | 正常 —— soffice 冷启 + 转换 ~2-4s,转完缓存到源同目录 `.preview/<stem>.<hash>.pdf`,再点即时。源文件一改(mtime/size 变)hash 变、自动重转 |
|
||||||
| `mp_*` tool 没出现在对话里 | `.env` 没设 `MP_API_KEY`,build_agent 跳过注册。设了重启 web 即可;Materials Project 联网查询走 host-side tool,离线 pymatgen 不受影响。 |
|
| `mp_*` tool 没出现在对话里 | `.env` 没设 `MP_API_KEY`,build_agent 跳过注册。设了重启 web 即可;Materials Project 联网查询走 host-side tool,离线 pymatgen 不受影响。 |
|
||||||
| 豆包调价了 | 改 `config/media/doubao.yaml` 的 `price_cny_per_image` 一行 → 重启 web。**历史 usage_events 不受影响**(units jsonb 里有当时单价 snapshot,聚合查仍按旧价);新写入按新价。涨价瞬间到改 YAML 中间这段记账偏低,开发期接受 |
|
| 豆包调价了 | 改 `config/media/doubao.yaml` 的 `price_cny_per_image` 一行 → 重启 web。**历史 usage_events 不受影响**(units jsonb 里有当时单价 snapshot,聚合查仍按旧价);新写入按新价。涨价瞬间到改 YAML 中间这段记账偏低,开发期接受 |
|
||||||
| `kill -HUP <pid>` 后 `/openapi.json` 没新接口 | uvicorn **不响应 SIGHUP**(没装 handler,落 Python 默认终止;Windows 上信号本身无效)。Ubuntu 上用 `systemctl restart zcbot`,或 unit 加 `--reload` 让 uvicorn 监听文件自动重起(见"部署"段)。验证:`curl -s http://127.0.0.1:8765/openapi.json \| python3 -c 'import sys,json;print([p for p in json.load(sys.stdin)["paths"] if "auth" in p])'` |
|
| `kill -HUP <pid>` 后 `/openapi.json` 没新接口 | uvicorn **不响应 SIGHUP**(没装 handler,落 Python 默认终止;Windows 上信号本身无效)。Ubuntu 上用 `systemctl restart zcbot`,或 unit 加 `--reload` 让 uvicorn 监听文件自动重起(见"部署"段)。验证:`curl -s http://127.0.0.1:8765/openapi.json \| python3 -c 'import sys,json;print([p for p in json.load(sys.stdin)["paths"] if "auth" in p])'` |
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,123 @@
|
||||||
|
"""web/pptx_render.py 的 focused tests(DESIGN §8.3 验收)。
|
||||||
|
|
||||||
|
soffice 本机不一定装,全部 mock 掉 `_run_soffice` / `find_soffice` —— 只验缓存 / 失效 /
|
||||||
|
降级 / prune 这些纯逻辑;真 soffice 转换由部署环境兜底,不在单测里跑。
|
||||||
|
"""
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import unittest
|
||||||
|
from pathlib import Path
|
||||||
|
from tempfile import TemporaryDirectory
|
||||||
|
from unittest import mock
|
||||||
|
|
||||||
|
import web.pptx_render as R
|
||||||
|
from web.app import _safe_join # 路径越界拒绝
|
||||||
|
|
||||||
|
|
||||||
|
def _fake_soffice_run(soffice, pptx_path, outdir, timeout):
|
||||||
|
"""假装 soffice:在 outdir 落一个 <stem>.pdf,返回其路径。"""
|
||||||
|
out = Path(outdir) / f"{Path(pptx_path).stem}.pdf"
|
||||||
|
out.write_bytes(b"%PDF-1.4 fake\n")
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
class PptxRenderTests(unittest.TestCase):
|
||||||
|
def _make_pptx(self, d: Path, name: str = "deck.pptx") -> Path:
|
||||||
|
p = d / name
|
||||||
|
p.write_bytes(b"PK\x03\x04 fake pptx bytes")
|
||||||
|
return p
|
||||||
|
|
||||||
|
def test_convert_success_lands_pdf_in_preview_dir(self):
|
||||||
|
with TemporaryDirectory() as tmp:
|
||||||
|
pptx = self._make_pptx(Path(tmp))
|
||||||
|
with mock.patch.object(R, "find_soffice", return_value="soffice"), \
|
||||||
|
mock.patch.object(R, "_run_soffice", side_effect=_fake_soffice_run):
|
||||||
|
pdf = R.pptx_to_pdf(pptx)
|
||||||
|
self.assertTrue(pdf.exists())
|
||||||
|
self.assertEqual(pdf.parent.name, ".preview")
|
||||||
|
self.assertEqual(pdf.suffix, ".pdf")
|
||||||
|
self.assertTrue(pdf.name.startswith("deck."))
|
||||||
|
|
||||||
|
def test_cache_hit_skips_soffice(self):
|
||||||
|
with TemporaryDirectory() as tmp:
|
||||||
|
pptx = self._make_pptx(Path(tmp))
|
||||||
|
with mock.patch.object(R, "find_soffice", return_value="soffice"), \
|
||||||
|
mock.patch.object(R, "_run_soffice", side_effect=_fake_soffice_run):
|
||||||
|
first = R.pptx_to_pdf(pptx)
|
||||||
|
# 第二次:soffice 一律不该被调用(命中缓存);find_soffice 也不必走
|
||||||
|
with mock.patch.object(R, "_run_soffice") as run, \
|
||||||
|
mock.patch.object(R, "find_soffice") as find:
|
||||||
|
second = R.pptx_to_pdf(pptx)
|
||||||
|
self.assertEqual(first, second)
|
||||||
|
run.assert_not_called()
|
||||||
|
find.assert_not_called()
|
||||||
|
|
||||||
|
def test_source_change_invalidates_and_prunes_old(self):
|
||||||
|
with TemporaryDirectory() as tmp:
|
||||||
|
pptx = self._make_pptx(Path(tmp))
|
||||||
|
with mock.patch.object(R, "find_soffice", return_value="soffice"), \
|
||||||
|
mock.patch.object(R, "_run_soffice", side_effect=_fake_soffice_run):
|
||||||
|
old_pdf = R.pptx_to_pdf(pptx)
|
||||||
|
# 改源内容 + 推后 mtime → hash 变 → 重转 + 删旧 hash
|
||||||
|
time.sleep(0.01)
|
||||||
|
pptx.write_bytes(b"PK\x03\x04 fake pptx bytes CHANGED MORE")
|
||||||
|
new_mtime = time.time() + 5
|
||||||
|
os.utime(pptx, (new_mtime, new_mtime))
|
||||||
|
new_pdf = R.pptx_to_pdf(pptx)
|
||||||
|
self.assertNotEqual(old_pdf.name, new_pdf.name)
|
||||||
|
self.assertTrue(new_pdf.exists())
|
||||||
|
self.assertFalse(old_pdf.exists(), "旧 hash 缓存应被 prune")
|
||||||
|
# .preview 下只剩这一份
|
||||||
|
survivors = list(new_pdf.parent.glob("deck.*.pdf"))
|
||||||
|
self.assertEqual(survivors, [new_pdf])
|
||||||
|
|
||||||
|
def test_missing_soffice_raises_not_found(self):
|
||||||
|
with TemporaryDirectory() as tmp:
|
||||||
|
pptx = self._make_pptx(Path(tmp))
|
||||||
|
with mock.patch.object(R, "find_soffice",
|
||||||
|
side_effect=R.SofficeNotFoundError("no soffice")):
|
||||||
|
with self.assertRaises(R.SofficeNotFoundError):
|
||||||
|
R.pptx_to_pdf(pptx)
|
||||||
|
|
||||||
|
def test_missing_pptx_raises_convert_error(self):
|
||||||
|
with TemporaryDirectory() as tmp:
|
||||||
|
with self.assertRaises(R.PptxConvertError):
|
||||||
|
R.pptx_to_pdf(Path(tmp) / "nope.pptx")
|
||||||
|
|
||||||
|
def test_find_soffice_uses_path_fallback(self):
|
||||||
|
with mock.patch("pathlib.Path.exists", return_value=False), \
|
||||||
|
mock.patch("shutil.which", return_value="/usr/bin/soffice") as which:
|
||||||
|
self.assertEqual(R.find_soffice(), "/usr/bin/soffice")
|
||||||
|
self.assertTrue(which.called)
|
||||||
|
|
||||||
|
def test_find_soffice_missing_raises(self):
|
||||||
|
with mock.patch("pathlib.Path.exists", return_value=False), \
|
||||||
|
mock.patch("shutil.which", return_value=None):
|
||||||
|
with self.assertRaises(R.SofficeNotFoundError):
|
||||||
|
R.find_soffice()
|
||||||
|
|
||||||
|
|
||||||
|
class SafePathTraversalTests(unittest.TestCase):
|
||||||
|
"""preview_pdf 端点复用 _safe_join 做边界:越界路径必须被挡。"""
|
||||||
|
|
||||||
|
def test_rejects_parent_traversal(self):
|
||||||
|
with TemporaryDirectory() as tmp:
|
||||||
|
root = Path(tmp)
|
||||||
|
with self.assertRaises(Exception):
|
||||||
|
_safe_join(root, "../../etc/passwd")
|
||||||
|
|
||||||
|
def test_rejects_absolute(self):
|
||||||
|
with TemporaryDirectory() as tmp:
|
||||||
|
root = Path(tmp)
|
||||||
|
with self.assertRaises(Exception):
|
||||||
|
_safe_join(root, "/etc/passwd")
|
||||||
|
|
||||||
|
def test_allows_inside(self):
|
||||||
|
with TemporaryDirectory() as tmp:
|
||||||
|
root = Path(tmp)
|
||||||
|
got = _safe_join(root, "sub/deck.pptx")
|
||||||
|
self.assertEqual(got, (root / "sub" / "deck.pptx").resolve())
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
unittest.main()
|
||||||
50
web/app.py
50
web/app.py
|
|
@ -58,6 +58,16 @@ STATUS_WRITABLE = ("completed", "abandoned") # web 不让从 web 端切回 acti
|
||||||
ORDER_FIELDS = ("created_at", "updated_at", "name", "status")
|
ORDER_FIELDS = ("created_at", "updated_at", "name", "status")
|
||||||
ORDER_DEFAULT = "-created_at"
|
ORDER_DEFAULT = "-created_at"
|
||||||
|
|
||||||
|
# pptx→PDF 预览:按解析后的 pptx 绝对路径加锁,防同一文件并发重复转换(DESIGN §8.3)。
|
||||||
|
_pptx_preview_locks: dict[str, asyncio.Lock] = {}
|
||||||
|
|
||||||
|
|
||||||
|
def _pptx_lock_for(abs_path: str) -> asyncio.Lock:
|
||||||
|
lock = _pptx_preview_locks.get(abs_path)
|
||||||
|
if lock is None:
|
||||||
|
lock = _pptx_preview_locks[abs_path] = asyncio.Lock()
|
||||||
|
return lock
|
||||||
|
|
||||||
|
|
||||||
# ─────────────────────────── helpers ───────────────────────────
|
# ─────────────────────────── helpers ───────────────────────────
|
||||||
|
|
||||||
|
|
@ -1635,6 +1645,46 @@ def create_app() -> FastAPI:
|
||||||
headers={"Cache-Control": "no-cache"},
|
headers={"Cache-Control": "no-cache"},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
@app.get("/v1/files/preview_pdf", tags=["files"])
|
||||||
|
async def preview_pdf(
|
||||||
|
path: str,
|
||||||
|
user_id: UUID = Depends(require_user),
|
||||||
|
):
|
||||||
|
"""把 user_root 下的 .pptx 转成 PDF 返回,供前端复用 PDF iframe 在线预览。
|
||||||
|
|
||||||
|
转换跑在 backend host(不进沙盒),按需触发 + 缓存到 `.preview/`(DESIGN §8.3)。
|
||||||
|
soffice 缺失 → 501;转换失败/超时 → 500;前端据此回退到下载。
|
||||||
|
"""
|
||||||
|
from .pptx_render import (
|
||||||
|
PptxConvertError,
|
||||||
|
SofficeNotFoundError,
|
||||||
|
pptx_to_pdf,
|
||||||
|
)
|
||||||
|
|
||||||
|
root = _load_user_root(user_id)
|
||||||
|
target = _safe_join(root, path)
|
||||||
|
if not target.exists():
|
||||||
|
raise HTTPException(404, f"file not found: {path}")
|
||||||
|
if not target.is_file():
|
||||||
|
raise HTTPException(400, f"not a file: {path}")
|
||||||
|
if target.suffix.lower() not in (".pptx", ".ppt"):
|
||||||
|
raise HTTPException(400, f"not a pptx: {path}")
|
||||||
|
|
||||||
|
abs_path = str(target.resolve())
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
async with _pptx_lock_for(abs_path):
|
||||||
|
try:
|
||||||
|
pdf_path = await loop.run_in_executor(None, pptx_to_pdf, target)
|
||||||
|
except SofficeNotFoundError as e:
|
||||||
|
raise HTTPException(501, str(e))
|
||||||
|
except PptxConvertError as e:
|
||||||
|
raise HTTPException(500, str(e))
|
||||||
|
return FileResponse(
|
||||||
|
path=str(pdf_path),
|
||||||
|
media_type="application/pdf",
|
||||||
|
headers={"Cache-Control": "no-cache"},
|
||||||
|
)
|
||||||
|
|
||||||
@app.post("/v1/files/upload", tags=["files"])
|
@app.post("/v1/files/upload", tags=["files"])
|
||||||
async def upload_files(
|
async def upload_files(
|
||||||
path: str = Form(""),
|
path: str = Form(""),
|
||||||
|
|
|
||||||
|
|
@ -0,0 +1,149 @@
|
||||||
|
"""pptx_render.py: 把 .pptx 高保真转成 PDF,供前端在线预览复用现成的 PDF iframe。
|
||||||
|
|
||||||
|
设计见 DESIGN §8.3。要点:
|
||||||
|
- 跑在 backend host(`web/app.py` 进程),**不进执行沙盒**;面向 user_root 下任意 pptx。
|
||||||
|
- 调 `soffice --headless --convert-to pdf`;**每次独立 `-env:UserInstallation`** 临时 profile,
|
||||||
|
绕开 LibreOffice 单 profile 锁(否则并发转换互斥),用完即删。
|
||||||
|
- 转换结果缓存到源 pptx 同目录的隐藏 `.preview/<stem>.<hash>.pdf`;hash 由 mtime+size 派生,
|
||||||
|
源文件一改 hash 变 → 自然失效。`.preview/` 是 dotdir,`_enumerate_files` 已跳过。
|
||||||
|
- soffice 缺失 → 抛 `SofficeNotFoundError`(端点回 501);超时/失败 → 抛 `PptxConvertError`。
|
||||||
|
|
||||||
|
本模块是**同步**的(subprocess 阻塞);端点侧用 `run_in_executor` 不堵事件循环、用
|
||||||
|
per-path `asyncio.Lock` 防同一文件并发重复转换。
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import shutil
|
||||||
|
import subprocess
|
||||||
|
import tempfile
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Windows 常见安装路径 + Linux 部署路径;都没有再查 PATH。
|
||||||
|
_SOFFICE_CANDIDATES = [
|
||||||
|
r"C:\Program Files\LibreOffice\program\soffice.exe",
|
||||||
|
r"C:\Program Files (x86)\LibreOffice\program\soffice.exe",
|
||||||
|
"/usr/bin/soffice",
|
||||||
|
"/usr/bin/libreoffice",
|
||||||
|
"/opt/libreoffice/program/soffice",
|
||||||
|
]
|
||||||
|
|
||||||
|
_PREVIEW_DIRNAME = ".preview"
|
||||||
|
_DEFAULT_TIMEOUT = 60 # soffice 冷启 + 转换;大 deck 靠它兜底,超时 kill
|
||||||
|
|
||||||
|
|
||||||
|
class SofficeNotFoundError(RuntimeError):
|
||||||
|
"""本机/镜像未装 LibreOffice。端点据此回 501 + 提示下载。"""
|
||||||
|
|
||||||
|
|
||||||
|
class PptxConvertError(RuntimeError):
|
||||||
|
"""soffice 转换失败 / 超时 / 没产出 PDF。端点据此回 500 + 前端回退下载。"""
|
||||||
|
|
||||||
|
|
||||||
|
def find_soffice() -> str:
|
||||||
|
"""定位 soffice 可执行文件;复用 render_bg.py 的「候选路径 + PATH 兜底」思路。"""
|
||||||
|
for c in _SOFFICE_CANDIDATES:
|
||||||
|
if Path(c).exists():
|
||||||
|
return c
|
||||||
|
for name in ("soffice", "soffice.exe", "libreoffice"):
|
||||||
|
p = shutil.which(name)
|
||||||
|
if p:
|
||||||
|
return p
|
||||||
|
raise SofficeNotFoundError(
|
||||||
|
"服务器未装 LibreOffice(soffice),无法在线预览 pptx。请安装 "
|
||||||
|
"libreoffice-impress 或下载原文件查看。"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _cache_pdf_path(pptx_path: Path) -> Path:
|
||||||
|
"""缓存落点:`<pptx 同目录>/.preview/<stem>.<hash>.pdf`。
|
||||||
|
|
||||||
|
hash 由源文件 (mtime_ns, size) 派生 —— 源一改 hash 变,旧缓存自然失效(换名),
|
||||||
|
无需显式比对新鲜度。
|
||||||
|
"""
|
||||||
|
st = pptx_path.stat()
|
||||||
|
sig = f"{st.st_mtime_ns}-{st.st_size}".encode("utf-8")
|
||||||
|
digest = hashlib.sha1(sig).hexdigest()[:12]
|
||||||
|
return pptx_path.parent / _PREVIEW_DIRNAME / f"{pptx_path.stem}.{digest}.pdf"
|
||||||
|
|
||||||
|
|
||||||
|
def _prune_stale(cache_dir: Path, stem: str, keep: Path) -> None:
|
||||||
|
"""删同一 stem 的旧 hash 缓存(源文件已变),只留 keep。兜底磁盘,best-effort。"""
|
||||||
|
try:
|
||||||
|
for old in cache_dir.glob(f"{stem}.*.pdf"):
|
||||||
|
if old != keep:
|
||||||
|
old.unlink(missing_ok=True)
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
def _run_soffice(soffice: str, pptx_path: Path, outdir: Path, timeout: int) -> Path:
|
||||||
|
"""跑一次 soffice 转换,产出 PDF 落到 outdir,返回该 PDF 路径。
|
||||||
|
|
||||||
|
`-env:UserInstallation` 指向**独立临时 profile**,避免与宿主 LibreOffice 或其他并发
|
||||||
|
转换抢单 profile 锁。`--convert-to` 模式默认不执行文档宏(安全边界,见 DESIGN §8.3)。
|
||||||
|
"""
|
||||||
|
with tempfile.TemporaryDirectory(prefix="lo-profile-") as profile:
|
||||||
|
profile_uri = Path(profile).resolve().as_uri()
|
||||||
|
cmd = [
|
||||||
|
soffice,
|
||||||
|
"--headless",
|
||||||
|
"--norestore",
|
||||||
|
"--nolockcheck",
|
||||||
|
"--nodefault",
|
||||||
|
f"-env:UserInstallation={profile_uri}",
|
||||||
|
"--convert-to",
|
||||||
|
"pdf",
|
||||||
|
"--outdir",
|
||||||
|
str(outdir),
|
||||||
|
str(pptx_path),
|
||||||
|
]
|
||||||
|
try:
|
||||||
|
proc = subprocess.run(
|
||||||
|
cmd,
|
||||||
|
capture_output=True,
|
||||||
|
timeout=timeout,
|
||||||
|
check=False,
|
||||||
|
)
|
||||||
|
except subprocess.TimeoutExpired as e:
|
||||||
|
raise PptxConvertError(
|
||||||
|
f"pptx 转换超时(>{timeout}s):{pptx_path.name}"
|
||||||
|
) from e
|
||||||
|
|
||||||
|
out_pdf = outdir / f"{pptx_path.stem}.pdf"
|
||||||
|
if proc.returncode != 0 or not out_pdf.exists():
|
||||||
|
tail = (proc.stderr or proc.stdout or b"").decode("utf-8", "replace")[-500:]
|
||||||
|
raise PptxConvertError(
|
||||||
|
f"soffice 转换失败(rc={proc.returncode}):{pptx_path.name}\n{tail}"
|
||||||
|
)
|
||||||
|
return out_pdf
|
||||||
|
|
||||||
|
|
||||||
|
def pptx_to_pdf(pptx_path: Path | str, *, timeout: int = _DEFAULT_TIMEOUT) -> Path:
|
||||||
|
"""把 pptx 转成 PDF 并返回 PDF 路径(同步、带缓存)。
|
||||||
|
|
||||||
|
命中且新鲜的缓存直接返回;否则调 soffice 转换,产物原子落到 `.preview/`。
|
||||||
|
soffice 缺失 → `SofficeNotFoundError`;转换失败/超时 → `PptxConvertError`。
|
||||||
|
"""
|
||||||
|
pptx_path = Path(pptx_path)
|
||||||
|
if not pptx_path.is_file():
|
||||||
|
raise PptxConvertError(f"pptx 不存在:{pptx_path}")
|
||||||
|
|
||||||
|
cache_pdf = _cache_pdf_path(pptx_path)
|
||||||
|
if cache_pdf.exists():
|
||||||
|
return cache_pdf
|
||||||
|
|
||||||
|
soffice = find_soffice() # 先解析 soffice,缺失早抛(不白建临时目录)
|
||||||
|
|
||||||
|
cache_dir = cache_pdf.parent
|
||||||
|
cache_dir.mkdir(parents=True, exist_ok=True)
|
||||||
|
with tempfile.TemporaryDirectory(prefix="pptx-pdf-") as tmp:
|
||||||
|
out_pdf = _run_soffice(soffice, pptx_path, Path(tmp), timeout)
|
||||||
|
# 原子落位:同分区 replace;跨分区(临时目录在别处)退回 copy。
|
||||||
|
try:
|
||||||
|
out_pdf.replace(cache_pdf)
|
||||||
|
except OSError:
|
||||||
|
shutil.copyfile(out_pdf, cache_pdf)
|
||||||
|
|
||||||
|
_prune_stale(cache_dir, pptx_path.stem, keep=cache_pdf)
|
||||||
|
return cache_pdf
|
||||||
|
|
@ -682,6 +682,11 @@
|
||||||
#file-preview-modal .body { flex: 1; overflow: auto; padding: 12px; position: relative; }
|
#file-preview-modal .body { flex: 1; overflow: auto; padding: 12px; position: relative; }
|
||||||
#file-preview-modal .body.center { display: flex; align-items: center; justify-content: center; }
|
#file-preview-modal .body.center { display: flex; align-items: center; justify-content: center; }
|
||||||
#file-preview-modal .body .ph { color: var(--muted); font-size: 13px; text-align: center; }
|
#file-preview-modal .body .ph { color: var(--muted); font-size: 13px; text-align: center; }
|
||||||
|
.preview-spinner {
|
||||||
|
width: 22px; height: 22px; border-radius: 50%; margin: 0 auto 10px;
|
||||||
|
border: 2px solid var(--border); border-top-color: var(--accent);
|
||||||
|
animation: spin .8s linear infinite;
|
||||||
|
}
|
||||||
#file-preview-modal .body img.preview-img {
|
#file-preview-modal .body img.preview-img {
|
||||||
max-width: 100%; max-height: 100%; object-fit: contain;
|
max-width: 100%; max-height: 100%; object-fit: contain;
|
||||||
display: block; margin: 0 auto;
|
display: block; margin: 0 auto;
|
||||||
|
|
|
||||||
|
|
@ -51,6 +51,7 @@ const _EXT_GROUPS = {
|
||||||
]),
|
]),
|
||||||
docx: new Set(["docx"]),
|
docx: new Set(["docx"]),
|
||||||
xlsx: new Set(["xlsx","xls"]),
|
xlsx: new Set(["xlsx","xls"]),
|
||||||
|
ppt: new Set(["pptx","ppt"]),
|
||||||
};
|
};
|
||||||
export function _categorize(rel) {
|
export function _categorize(rel) {
|
||||||
const m = /\.([a-z0-9]+)$/i.exec(rel);
|
const m = /\.([a-z0-9]+)$/i.exec(rel);
|
||||||
|
|
@ -76,6 +77,8 @@ export async function openFilePreview(rel) {
|
||||||
$("file-preview-modal").classList.add("show");
|
$("file-preview-modal").classList.add("show");
|
||||||
|
|
||||||
const cat = _categorize(rel);
|
const cat = _categorize(rel);
|
||||||
|
// pptx/ppt:后端转 PDF 再复用现成 PDF iframe(非下载原文件),首次稍候 + 失败回退下载。
|
||||||
|
if (cat === "ppt") { await _showPptAsPdf(rel, $("fp-body"), $("fp-meta"), _showFallback); return; }
|
||||||
try {
|
try {
|
||||||
const r = await fetch("/v1/files/download?path=" + encodeURIComponent(rel), {
|
const r = await fetch("/v1/files/download?path=" + encodeURIComponent(rel), {
|
||||||
headers: { "Authorization": "Bearer " + state.token },
|
headers: { "Authorization": "Bearer " + state.token },
|
||||||
|
|
@ -141,6 +144,34 @@ function _showPdf(blob) {
|
||||||
body.innerHTML = `<iframe class="preview-frame" src="${url}"></iframe>`;
|
body.innerHTML = `<iframe class="preview-frame" src="${url}"></iframe>`;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// pptx/ppt → 后端转 PDF → iframe。main / mini 共用:传各自 body / meta / fallback / 追踪 blob 的 fn。
|
||||||
|
async function _showPptAsPdf(rel, body, metaEl, fallbackFn, trackFn = _trackBlobUrl) {
|
||||||
|
body.className = "body center";
|
||||||
|
body.innerHTML = `<div class="ph"><div class="preview-spinner"></div>由 PPT 转换为 PDF · 首次稍候…</div>`;
|
||||||
|
if (metaEl) metaEl.textContent = "";
|
||||||
|
let r;
|
||||||
|
try {
|
||||||
|
r = await fetch("/v1/files/preview_pdf?path=" + encodeURIComponent(rel), {
|
||||||
|
headers: { "Authorization": "Bearer " + state.token },
|
||||||
|
});
|
||||||
|
} catch (e) {
|
||||||
|
fallbackFn("加载失败:" + e.message);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (r.status === 401) { logout(); return; }
|
||||||
|
if (!r.ok) {
|
||||||
|
let msg = "PPT 在线预览不可用,请下载查看";
|
||||||
|
if (r.status === 501) msg = "服务器未装 LibreOffice,无法在线预览 PPT,请下载查看";
|
||||||
|
else if (r.status === 500) msg = "PPT 转换失败,请下载原文件查看";
|
||||||
|
fallbackFn(msg);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const blob = await r.blob();
|
||||||
|
if (metaEl) metaEl.textContent = humanSize(blob.size) + " · PDF 预览";
|
||||||
|
body.className = "body";
|
||||||
|
body.innerHTML = `<iframe class="preview-frame" src="${trackFn(blob, "application/pdf")}"></iframe>`;
|
||||||
|
}
|
||||||
|
|
||||||
function _showText(text) {
|
function _showText(text) {
|
||||||
const body = $("fp-body");
|
const body = $("fp-body");
|
||||||
body.className = "body";
|
body.className = "body";
|
||||||
|
|
@ -283,6 +314,10 @@ async function openMiniFilePreview(rel) {
|
||||||
$("mini-preview-modal").classList.add("show");
|
$("mini-preview-modal").classList.add("show");
|
||||||
|
|
||||||
const cat = _categorize(rel);
|
const cat = _categorize(rel);
|
||||||
|
if (cat === "ppt") {
|
||||||
|
await _showPptAsPdf(rel, $("mp-body"), $("mp-meta"), _showMiniFallback, _trackMiniBlobUrl);
|
||||||
|
return;
|
||||||
|
}
|
||||||
try {
|
try {
|
||||||
const r = await fetch("/v1/files/download?path=" + encodeURIComponent(rel), {
|
const r = await fetch("/v1/files/download?path=" + encodeURIComponent(rel), {
|
||||||
headers: { "Authorization": "Bearer " + state.token },
|
headers: { "Authorization": "Bearer " + state.token },
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue