feat(skill): paper 学术论文写作(中英×三类型 + 引文三角核验)+ bump 0.16.1
新增自包含 skills/paper/:把实验数据/前期报告整理成可投稿论文 .docx。 流程骨架取 paper-writer-skill 的"先定图表"纪律 + ARS 的三角引文核验/反谄媚审稿, 底座全换成 zcbot 自有(documents/research 查文献与核验、plot_pub 出图、复用 proposal 渲染心智)。 - 中英双语 × 三类型(original/review/letter)子 md 分流,一篇只挂一套 cite_gbt7714/cite_elsevier + redlines_zh/redlines_en - 六阶段:摄取 → 八条对齐 spec → 文献矩阵 → 先定图表 → 逐章一段一卡(Methods→Results→Intro→Discussion→Abstract→Title) → 引文三角核验(存在性/三角/支撑度,台账 CITATIONS.md) → 验收渲染 + 投稿件 - 自带脚本:render_docx(--lang 图题切换 + --toc 默认关)/ quality_check (引文交叉核对 orphan/uncited/编号连续 + 结构/占位符/过度宣称)/ word_count / render_diagrams - smoke 验证:happy path 全 OK,orphan/uncited/缺号负例正确触发,render 出 docx - SKILL_LIST 15→16,PROGRESS 加一条 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
0d69ae86e2
commit
660bec0f2f
10
PROGRESS.md
10
PROGRESS.md
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
> 配合 `DESIGN.md`。本文件只记 phase 状态、决策偏差、文件量、下一步。每条 1-2 句:做了啥 + 关键判断;细节查 `git log` / `git diff` / `DESIGN §7.9`。
|
||||
|
||||
最后更新:2026-06-16(图像理解 look_at_image + seedream i2i 改图,DESIGN §8.1 双路落地)
|
||||
最后更新:2026-06-17(新增 paper skill:学术论文写作,中英双语 × 三类型 + 引文三角核验)
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -21,6 +21,14 @@
|
|||
|
||||
## 已完成关键能力
|
||||
|
||||
### 2026-06-17 / paper skill:学术期刊论文写作
|
||||
|
||||
- 需求:现有 skill(proposal 写本子 / review 改稿 / research 查文献 / plot_pub 出图)缺"从零起草期刊投稿稿"这一环。联网调研开源论文 skill(ARS 32.1k★ / paper-writer-skill / claude-scientific-writer 1.9k★)——结论:不直接装(ARS 是 CC-BY-NC 非商用、全偏英文/医学/CS、引文默认 APA、依赖外部 API),但流程值得移植。
|
||||
- **方案**:新建自包含 `skills/paper/`,流程骨架取 paper-writer 的"先定图表 + stage-gate"与 ARS 的"三角引文核验 + 反谄媚审稿",**底座全换成 zcbot 自有**(documents/research 查文献与核验、plot_pub 出图、复用 proposal 的渲染心智)。**中英双语 × 三类型(original/review/letter)用子 md 分流**(cite_gbt7714/cite_elsevier + redlines_zh/redlines_en,一篇只挂一套)。
|
||||
- 六阶段:摄取 → 八条对齐 spec → 文献矩阵 → 先定图表 → 逐章一段一卡(Methods→Results→Intro→Discussion→Abstract→Title 顺序) → 引文三角核验(存在性/三角/支撑度,台账 CITATIONS.md) → 验收渲染 + 投稿件。终审复用 review skill。
|
||||
- 脚本(自带,不跨 skill 引):`render_diagrams.py`(照搬)/ `render_docx.py`(去 fund-type,加 `--lang {zh,en}` 图题切换 + `--toc` 默认关)/ `word_count.py`(类型×语言双口径预算)/ `quality_check.py`(论文版核心=**引文交叉核对** orphan/uncited/编号连续 + 结构/占位符/过度宣称/插图)。
|
||||
- 验证:微型 fixture 端到端 smoke——word_count 正确标欠预算、quality_check happy path 全 OK 且 orphan/uncited/缺号负例正确触发、render_docx 出 37KB docx。文件:1 SKILL.md + 6 references + 3 templates + 4 scripts。SKILL_LIST 同步(15→16)。bump 0.16.0 → 0.16.1。
|
||||
|
||||
### 2026-06-16 / look_at_image 图像理解(DESIGN §8.1 C 路落地)
|
||||
|
||||
- 需求:DeepSeek V4 主模型纯文本无视觉,挂 `look_at_image` 工具按需"借眼睛"读图(OCR / 描述 / 读图表),模型自决何时调。
|
||||
|
|
|
|||
|
|
@ -1,8 +1,8 @@
|
|||
# zcbot Skill 清单
|
||||
|
||||
服务对象:中国建筑材料科学研究总院 —— 无机非金属材料 R&D(水泥 / 混凝土 / 玻璃 / 陶瓷 / 耐火 / 新型建材)
|
||||
最后更新:2026-06-16
|
||||
Skill 总数:15
|
||||
最后更新:2026-06-17
|
||||
Skill 总数:16
|
||||
|
||||
zcbot 的"skill"是一份可加载的工作流脚本(`skills/<name>/SKILL.md` + 配套 templates / scripts / Python helper),模型在识别用户意图后挂载对应 skill,按其内置的阶段化流程产出可交付物。本文档面向**使用方 / 协作方**,按"做什么、什么时候用、什么时候别用、典型产物"组织。
|
||||
|
||||
|
|
@ -14,6 +14,7 @@ zcbot 的"skill"是一份可加载的工作流脚本(`skills/<name>/SKILL.md` +
|
|||
|
||||
| 分类 | Skill | 一句话 |
|
||||
|---|---|---|
|
||||
| 科研写作 | [paper](#paper) | 写期刊投稿论文:中文核心 / 英文 SCI(原创 / 综述 / 快报,IMRaD + 引文三角核验) |
|
||||
| 科研写作 | [proposal](#proposal) | 写本子 / 申报书 / 任务书(6 类基金) |
|
||||
| 科研写作 | [standard](#standard) | 起草标准:国标 / 行标 / 团标(含 T/CSTM)+ 编制说明 |
|
||||
| 科研写作 | [patent](#patent) | 写发明专利技术交底书(供代理师转写) |
|
||||
|
|
@ -34,6 +35,34 @@ zcbot 的"skill"是一份可加载的工作流脚本(`skills/<name>/SKILL.md` +
|
|||
|
||||
## 科研写作
|
||||
|
||||
### paper
|
||||
**撰写学术期刊投稿论文(中文核心 / 英文 SCI)。**
|
||||
|
||||
把实验数据 / 前期报告整理成可投稿的论文 .docx。流程:**先定类型与语言 → 八条对齐 spec → 文献矩阵 → 先定图表 → 逐章一段一卡 → 引文三角核验 → 验收渲染 + 投稿件**,不一口气出全文,关键章(Intro/Methods/Results/Discussion)"一段一卡"。
|
||||
|
||||
**覆盖类型 × 语言**(子 md 分流,一篇只挂一套):
|
||||
- 类型:原创研究(`original`,IMRaD)/ 综述(`review`,主题式)/ 快报(`letter`,凝练版)
|
||||
- 语言:中文核心(GB/T 7714 + 中文硬规则)/ 英文 SCI(Elsevier/IEEE 数字制 + 英文硬规则)
|
||||
|
||||
**何时用**:写论文、投稿稿、manuscript、写 Introduction/Methods/Results/Discussion、写综述、把实验数据整理成可投稿论文。
|
||||
|
||||
**何时不用**:
|
||||
- ⛔ 只改 / 润色已有稿 → 走 review
|
||||
- ⛔ 写本子 / 申报书 → 走 proposal;写交底书 → patent;写标准 → standard
|
||||
- ⛔ 只查文献 → research / documents;只出图 → plot_pub
|
||||
|
||||
**核心能力**:
|
||||
- 三类论文 × 中英双语的 IMRaD / 主题式骨架 + 篇幅预算(`paper_types.md`)
|
||||
- **引文三角核验**(`citation_verify.md`,移植 ARS 思路、后端换成自有 documents/research 库):存在性 → 三角印证 → 支撑度(抓原文比对 ≤25 词锚点,partial 就改论断迁就证据),编造引文零容忍
|
||||
- "先定图表再写正文"纪律(接 plot_pub 出 figure)+ 文献矩阵立证据底座
|
||||
- 写作顺序 Methods→Results→Intro→Discussion→Abstract→Title;关键章一段一卡 + 预告下一段
|
||||
- `quality_check.py`:结构 / 占位符 / 过度宣称 + **引文交叉核对**(orphan / uncited / 编号连续);`render_docx.py` 中英字体切换 + 图题自增;`word_count.py` 按类型 × 语言核篇幅
|
||||
- 终审复用 review skill 的反谄媚审稿协议;可选出 cover letter / AI 声明 / CRediT
|
||||
|
||||
**典型产物**:`<topic>.docx`(投稿稿)+ sections/ 分章草稿 + `lit_matrix.md`(文献矩阵)+ `CITATIONS.md`(引文核验台账)。
|
||||
|
||||
---
|
||||
|
||||
### proposal
|
||||
**撰写中国科研项目申报书 / 课题任务书。**
|
||||
|
||||
|
|
@ -464,6 +493,7 @@ paper_server 是内部 Django 文献库:元数据来自 OpenAlex,PDF / XML 由 S
|
|||
|
||||
实际任务往往跨多个 skill,典型组合:
|
||||
|
||||
- **写论文全流程**:analyze(拆问题) → stats_ml / pymatgen(算数据出结果) → research / documents(建文献矩阵) → plot_pub(先定图表) → paper(逐章起草 + 引文三角核验) → review(投稿前反谄媚终审)
|
||||
- **写本子全流程**:analyze(拆问题) → research / documents(查文献) → stats_ml(算配方-性能模型出预实验数据) → plot_pub(出图) → proposal(写本子) → review(审稿)
|
||||
- **写专利全流程**:patent(挖点 + 检索 + 起草) → research(查现有技术) → plot_pub(出附图) → review(终审)
|
||||
- **写标准全流程**:analyze(定标准化对象) → stats_ml(配方-性能 / 精密度试验数据定指标) → research / documents(查国内外现有标准与现状) → standard(起草标准 + 编制说明) → plot_pub(出图) → review(送审前终审)
|
||||
|
|
|
|||
|
|
@ -1,3 +1,3 @@
|
|||
# zcbot 版本号单一事实源:web/app.py 的 FastAPI version、/healthz 返回、前端展示都引这里。
|
||||
# 改版本只动这一行。
|
||||
__version__ = "0.16.0"
|
||||
__version__ = "0.16.1"
|
||||
|
|
|
|||
|
|
@ -0,0 +1,221 @@
|
|||
---
|
||||
name: paper
|
||||
description: 撰写学术期刊投稿论文(中文核心 / 英文 SCI;原创研究 original / 综述 review / 快报 letter)。把实验数据、前期报告整理成可投稿的论文 .docx,含 IMRaD 骨架、引文三角核验、投稿件。当用户要写论文、投稿稿、manuscript、写 Introduction/Methods/Results/Discussion、写综述、改投稿稿时使用。
|
||||
---
|
||||
|
||||
# 学术论文写作
|
||||
|
||||
把实验数据 / 前期素材变成可投稿的论文 .docx。**先定类型与语言 → 八条对齐 → 建文献矩阵 → 先定图表 → 逐章一段一卡 → 引文三角核验 → 验收渲染 + 投稿件** —— 不要一口气出全文。
|
||||
|
||||
进度展示建议:用 `task_progress` 标记「摄取素材 / 类型与八条对齐 / 文献矩阵 / 图表定稿 / 逐章起草 / 引文核验 / 验收渲染」等关键阶段;章节内每段确认不必单独更新。
|
||||
|
||||
## 边界(先划清,免得和别的 skill 撞)
|
||||
|
||||
| 与谁区分 | 边界 |
|
||||
|---|---|
|
||||
| vs `proposal` | proposal 写**本子/任务书**(立项依据骨架);paper 写**期刊投稿稿**(IMRaD 骨架)。两者各自独立 |
|
||||
| vs `review` | review 改**已有稿**;paper **从零起草**。paper 阶段六终审**调用** review 的协议,不重复造 |
|
||||
| vs `research`/`documents` | 它们查文献;paper 是消费方,引文核验(阶段五)接到它们头上 |
|
||||
| vs `patent`/`standard` | 写交底书→patent;写标准→standard |
|
||||
|
||||
**何时不用**:只改不写→review;写本子→proposal;只查文献→research/documents;只出图→plot_pub。
|
||||
|
||||
## 资源
|
||||
|
||||
下面所有路径都相对 **`<skill_dir>`** —— `load_skill` 返回头里的 `[skill=paper, dir=<绝对路径>]`,用这个绝对路径拼脚本/资源,不要假设 cwd。
|
||||
|
||||
**先读(always)**:
|
||||
- `<skill_dir>/references/paper_types.md` —— 原创/综述/快报 的 IMRaD 骨架 + 篇幅预算 + 章节命名
|
||||
|
||||
**按 spec 条件加载(一篇论文只挂一套)**:
|
||||
- 语言=zh → `references/cite_gbt7714.md` + `references/redlines_zh.md`
|
||||
- 语言=en → `references/cite_elsevier.md` + `references/redlines_en.md`
|
||||
|
||||
**阶段五必读**:
|
||||
- `references/citation_verify.md` —— 引文三角核验协议(存在性 / 三角印证 / 支撑度,接 documents/research)
|
||||
|
||||
**模板**:
|
||||
- `templates/spec.md` —— 八条对齐固定字段(复制到 task 级 spec 文件)
|
||||
- `templates/original_article.md` —— IMRaD 章节骨架(type=original)
|
||||
- `templates/review_article.md` —— 主题式章节骨架(type=review)
|
||||
|
||||
**脚本**(`.venv/Scripts/python.exe <skill_dir>/scripts/...`):
|
||||
- `scripts/render_diagrams.py` —— sections/*.md 的 ```mermaid``` 块 → `figures/fig_<caption>.png`(caption 必填+唯一)
|
||||
- `scripts/render_docx.py` —— md→docx,`--lang {zh,en}`(图题 图/Fig.),`--toc`(默认不出目录),自动 `**bold**`/列表/表格/`` 居中插图 + 图题自增
|
||||
- `scripts/word_count.py` —— `--type --lang`,章节篇幅 vs 预算
|
||||
- `scripts/quality_check.py` —— `--type`,结构/占位符/过度宣称/插图 + **引文交叉核对**(orphan/uncited/编号连续)
|
||||
|
||||
## 阶段零:摄取素材(有实验数据 / 报告 / PDF 时才走)
|
||||
|
||||
用户给实验数据 XLSX / 前期报告 DOCX / 相关论文 PDF / 目标期刊 Guide URL → 先转 `<task_dir>/source/<name>.md`,后续才能读:
|
||||
|
||||
```bash
|
||||
markitdown <path>/data.xlsx -o <task_dir>/source/data.md
|
||||
markitdown <path>/report.docx -o <task_dir>/source/report.md
|
||||
markitdown <path>/ref_paper.pdf -o <task_dir>/source/ref.md
|
||||
markitdown https://.../guide -o <task_dir>/source/guide.md
|
||||
```
|
||||
|
||||
转完后阶段一直接 `read <task_dir>/source/*.md` 拿事实,**实验数据一律以用户素材为准,不得自造**。
|
||||
|
||||
## 阶段一:八条对齐(写 spec)
|
||||
|
||||
产物:**task 级 spec 文件**(论文"宪法",后续每章前都要重读)。命名按 system prompt 的《task 级「宪法」文件命名约定》:
|
||||
|
||||
<task_dir>/<today>-<task_short_id>-<task_name>.spec.md
|
||||
|
||||
**0. 先检测已有 spec**(同 working_dir 可能已有别的 task 的 spec):
|
||||
|
||||
```
|
||||
glob <task_dir>/*-<task_short_id>-*.spec.md → 按文件名字典序排,取最大者作 current
|
||||
```
|
||||
|
||||
- 已有当前 task 的 spec → 读出展示,问「**沿用进阶段二** / **重定调**(以 today 为前缀写新版,旧版留存)」,⛔ BLOCKING
|
||||
- 只有别的 task 的 spec → 仅作参考;继续走 1-4
|
||||
- 完全没有 → 直接走 1-4
|
||||
|
||||
1. **先读 `references/paper_types.md`** 定论文类型(original/review/letter)
|
||||
2. **复制模板** `read templates/spec.md` → `write <task_dir>/<today>-<task_short_id>-<task_name>.spec.md`
|
||||
3. 按字段填(**§1 类型+语言、§2 目标期刊、§3 一句话贡献** 是后续所有阶段的锚,务必和用户敲定)
|
||||
4. ⛔ **BLOCKING:用户确认 spec 后才进阶段二**
|
||||
|
||||
spec 定下「类型 + 语言」后,**按 §资源 条件加载**对应的 cite_*.md + redlines_*.md,后续都遵这一套。
|
||||
|
||||
## 阶段二:文献矩阵(立证据底座)
|
||||
|
||||
> 移植自 ARS,后端用 zcbot 自己的库。Introduction 与 Discussion 靠这份矩阵,不靠记忆。
|
||||
|
||||
1. 据 spec §3/§4 的贡献与 gap,列要查的主题(英文 keyword 优先,见 research/documents 规则)
|
||||
2. 用 `documents`(材料类优先,中英 query 都行)/ `research`(要 DOI 走这个)检索,建矩阵到 `<task_dir>/lit_matrix.md`:
|
||||
|
||||
| 文献(真实条目) | DOI | 一句话贡献 | 在本文用在哪(Intro/Methods/Disc) |
|
||||
|---|---|---|---|
|
||||
| `<author year>` | `<doi>` | `<gap/方法/对比>` | Intro 第2段 |
|
||||
|
||||
3. ⛔ **BLOCKING:矩阵给用户过目**(查得够不够、方向对不对),确认后进阶段三
|
||||
4. 矩阵里的文献是阶段五核验的输入;起草引用先用 `[CITE-<keyword>]` 占位
|
||||
|
||||
## 阶段三:先定图表(写正文前)
|
||||
|
||||
> paper-writer 的关键纪律 —— 先把证据骨架(图/表)定下来,再写正文,避免正文写完发现图对不上。
|
||||
|
||||
1. 据 spec §6 图表清单,确认每张图/表**要表达的结论**与数据来源
|
||||
2. 出图:数据图走 `plot_pub` skill(材料论文配色/字号/矢量规范),流程/机理/装置图走 ```mermaid``` 块(caption 必填),实拍/SEM 直接 `![]()`
|
||||
3. 落到 `<task_dir>/figures/`;mermaid 块先留在将写的章节里,阶段六统一 `render_diagrams.py`
|
||||
4. ⛔ **BLOCKING:图表清单与初版图给用户确认**后进阶段四(图错了正文白写)
|
||||
|
||||
## 阶段四:逐章起草(一段一卡)
|
||||
|
||||
**写作顺序**(不是文件顺序):**Methods → Results → Introduction → Discussion → Abstract → Title**。先写定事实,再写需要全局视野的部分,最后凝练摘要题名。
|
||||
|
||||
复制 `templates/<original_article|review_article>.md` 对应小节到 `<task_dir>/sections/NN_xxx.md`(命名见 paper_types.md)。
|
||||
|
||||
每章两段式:**先列要点 → 用户确认 → 再起草 → 用户确认**。
|
||||
|
||||
**A. 起草前列要点**(改要点比改正文便宜):
|
||||
1. 读 **current spec** + 加载的 redlines + 本章在 paper_types.md 的篇幅预算与要素
|
||||
2. 列 3-6 条要点骨架:本章论点 / 用哪些图表 / 引哪些矩阵里的文献,每条贴预估篇幅
|
||||
3. ⛔ **BLOCKING:用户确认要点后才动正文**
|
||||
|
||||
**B. 正文起草**:
|
||||
4. 按要点填;引用处放 `[CITE-<keyword>]` 占位(阶段五再核验编号)
|
||||
5. **关键章节一段一卡** —— Introduction / Methods / Results / Discussion:写一段 → 报篇幅 + **预告下一段** → 等确认 → 写下一段。短章节(Abstract/Conclusion)一节一卡
|
||||
6. 报告格式(每次卡点):
|
||||
- **本段(节)**:章节名 / 实际篇幅 / 预算 / 与 redlines 对齐情况(可复现?只陈述?不过度宣称?)
|
||||
- **下一段(节)预告**:标题 + 3-5 条要点(论点 / 图表 / 引文)
|
||||
- 提问:"本段可以了吗?下一段要点改/加/删什么?"
|
||||
7. ⛔ **BLOCKING:等用户明确反馈**("OK"/"下一段"/"继续")才动笔。沉默/"看着不错"不算确认;**篇幅或 redlines 异常**时必须主动追问
|
||||
8. 用户确认**实质改动**(改机理解释 / 换核心数据图 / 调结论 / 增删引文 / 改创新点表述)后,追加一行到 `<task_dir>/REVISIONS.md`
|
||||
|
||||
两段式 + 段段卡是为了拦早 —— 论文连续生成容易把错方向(尤其机理论述、过度解读)推到底。
|
||||
|
||||
**例外**:用户**主动且明确**说"别问,直接全做"才一次跑完,跑完必须 quality_check + citation_verify。"太慢/太碎"的抱怨**不算**例外。
|
||||
|
||||
## 阶段五:引文三角核验(渲染前必跑)
|
||||
|
||||
> 论文最致命的失分是编造引文 / 引而不实。**逐条**走 `references/citation_verify.md` 三层:
|
||||
|
||||
1. **存在性**:每条引文在 documents/research 查到真实条目,字段以库返回为准;查不到标 `[未核实]`,**不编造**
|
||||
2. **三角印证**:关键论断的支撑引文至少两个独立来源一致
|
||||
3. **支撑度**:抓回 md_content/PDF,定位 ≤25 词锚点原文,判 support/partial/not-support;partial→**改论断迁就证据**,not-support→删或换
|
||||
4. 台账写 `<task_dir>/CITATIONS.md`;只有 verified 的进编号
|
||||
5. 按文中首次出现顺序编 `[1][2]...`,把占位替换掉,写 `sections/<NN>_references.md`
|
||||
|
||||
⛔ status 非 verified 的引文不得带进最终稿(核实 / 删论断 / 用户拍板,三选一)。
|
||||
|
||||
## 阶段六:验收 + 渲染 + 投稿件
|
||||
|
||||
```bash
|
||||
python <skill_dir>/scripts/word_count.py <task_dir>/sections/ --type original --lang en
|
||||
python <skill_dir>/scripts/quality_check.py <task_dir>/sections/ --type original
|
||||
python <skill_dir>/scripts/render_diagrams.py <task_dir>/sections/ # 有 ```mermaid 块就跑
|
||||
python <skill_dir>/scripts/render_docx.py <task_dir>/sections/ --lang en -o <task_dir>/<topic>.docx
|
||||
```
|
||||
|
||||
- `quality_check` 的 orphan/uncited/占位符不通过 → 回头改章节或补阶段五核验,再跑
|
||||
- **终审走 `review` skill** 的反谄媚审稿协议(EIC + 审稿人视角,pre-commit 评分防一味说好),别自己说"挺好"就交
|
||||
- **投稿件(可选,用户要才出)**:cover letter(说清贡献与契合度)/ Highlights / AI 使用声明 / 作者贡献(CRediT)/ 利益冲突声明 —— 按目标期刊要求
|
||||
|
||||
## 工作目录
|
||||
|
||||
`<task_dir>` = system prompt 给的**绝对路径**。所有产物写到 task_dir 下,不要写 cwd / skills/ / repo 根。
|
||||
|
||||
```
|
||||
<task_dir>/
|
||||
├── source/ # 摄取的素材(实验数据/报告/参考论文)
|
||||
├── <today>-<task_short_id>-<task_name>.spec.md # 阶段一定调,论文宪法
|
||||
├── lit_matrix.md # 阶段二文献矩阵
|
||||
├── figures/ # 阶段三图表(plot_pub 出的 png / mermaid 渲染的 png)
|
||||
├── sections/ # 阶段四逐章产物(NN_xxx.md)
|
||||
├── CITATIONS.md # 阶段五引文核验台账
|
||||
├── REVISIONS.md # 修订日志:每次卡点用户确认的实质改动
|
||||
└── <topic>.docx # 最终投稿稿(按论文主题命名,不要 output.docx)
|
||||
```
|
||||
|
||||
## 修订日志 (REVISIONS.md)
|
||||
|
||||
`<task_dir>/REVISIONS.md` 是产物迭代的紧凑 changelog。**spec 是宪法(定调一次),REVISIONS 是实施日志(每次卡点累加)**。
|
||||
|
||||
| 情形 | 记? |
|
||||
|---|---|
|
||||
| 用户确认改 **机理解释 / 核心数据图 / 结论 / 创新点表述** | ✅ 必记 |
|
||||
| 用户确认 增/删/换 **引文 / 图表 / 章节** | ✅ 必记 |
|
||||
| 阶段五因支撑度不足**改写论断** | ✅ 必记(注明触发的引文) |
|
||||
| 章节首次起草(从 0 写出) | ❌ 不记 |
|
||||
| 错别字 / 标点 / 排版 | ❌ 不记 |
|
||||
|
||||
格式(倒序,最新在上;文件首次创建写一次头注释):
|
||||
```
|
||||
- `<YYYY-MM-DD HH:MM>` | <文件:章节/段> | <一句话改了什么> — <为什么>
|
||||
```
|
||||
操作:`edit` 在头注释后插入新行;文件不存在就 `write` 带头注释创建。
|
||||
|
||||
## 硬规则速查(违反审稿扣分)
|
||||
|
||||
- **可复现**:Methods 给材料来源/纯度/配比/工艺/仪器型号/标准,不写"按常规方法"
|
||||
- **结果只陈述**:机理解释留 Discussion;数据带单位 + 误差
|
||||
- **不过度宣称**:"国际领先/首次/world-first/unprecedented" 等无证据夸张词禁用(quality_check 拦)
|
||||
- **引文真实**:经 citation_verify 核验;不编造、不凭印象;起草用 `[CITE-xx]` 占位,渲染前必清空
|
||||
- **摘要自含**:不出现 [n] 引文与图表号
|
||||
- **术语统一**:一个概念一个词;缩写首次给全称
|
||||
- **图**:用 mermaid/matplotlib 出 png,**不用 ASCII 字符画**(Word 必错位,quality_check 拦);图题自增不手写"图 2-2"
|
||||
- 详细规则见加载的 redlines_zh.md / redlines_en.md
|
||||
|
||||
## 反模式
|
||||
|
||||
- 未 spec 就硬写正文 / 一次性出全文 / 跳过"列要点"直接写
|
||||
- 跳过文献矩阵与图表定稿,边写正文边凑图凑引文
|
||||
- 关键章节(Intro/Methods/Results/Discussion)整章一次出 —— 必须段段卡
|
||||
- **自造实验数据 / 指标**(不知道就 `<TODO 待用户提供>`)
|
||||
- **编造引文** / 引文凭印象 / 带 `[CITE-xx]` 占位就渲染 / 跳过阶段五核验
|
||||
- 结果章大段解读机理 / 讨论重复结果数字 / 摘要里写 [n]
|
||||
- 不跑 quality_check 就交付 / 文件名 output.docx / 论文.docx(按主题命名)
|
||||
|
||||
## 输出
|
||||
|
||||
完成后给用户:
|
||||
- 文件路径
|
||||
- 各章节篇幅 vs 预算
|
||||
- 引文核验结论(verified 条数 / 待用户提供条数 / 因支撑度改写的论断)
|
||||
- `<TODO>` 待补项清单
|
||||
- 是否需要出投稿件(cover letter / 声明)
|
||||
|
|
@ -0,0 +1,71 @@
|
|||
# 引文三角核验协议 (language 无关)
|
||||
|
||||
论文最致命的失分是**编造引文**(hallucinated citation)与**引而不实**(cite 的文献不支撑该论断)。
|
||||
本协议把每条引文从"看起来对"逼到"经得起查"。移植自 ARS 的 triangulation + claim-faithfulness 思路,
|
||||
**后端换成 zcbot 自己的 `documents` / `research` 库**(它们本就带 DOI + md_content,做 anchor 比对反而更顺)。
|
||||
|
||||
> 这是**协议**不是脚本 —— 你(模型)拿 host-side tool 逐条执行。quality_check.py 只做机械的
|
||||
> orphan/uncited/编号核对,真伪与支撑度靠本协议。
|
||||
|
||||
## 何时跑
|
||||
|
||||
- 阶段四逐章起草后、阶段六渲染前,对所有引文跑一遍
|
||||
- 用户自带的引文清单**也要跑**(用户也可能记错卷期/页码)
|
||||
|
||||
## 三层核验(逐条引文执行)
|
||||
|
||||
### 第 1 层 — 存在性 (exists)
|
||||
|
||||
每条引文先确认"这篇文献真实存在":
|
||||
|
||||
1. `documents` 库语义检索(材料类优先,中英 query 都行)/ `research` 库 `search()` / `get_paper(doi)`
|
||||
2. 命中 → 记下真实 DOI / 作者 / 年份 / 期刊 / 卷期页;**以库里返回为准**,不沿用记忆里的字段
|
||||
3. 两个库都查不到 → 标 `[未核实]`,**不得编造条目**;告诉用户"这条找不到来源,请提供 PDF/DOI 或删去该论断"
|
||||
|
||||
### 第 2 层 — 三角印证 (triangulate)
|
||||
|
||||
关键论断(创新点对比、机理依据、定量结论)的支撑引文,**至少两个独立信息源一致**才算稳:
|
||||
|
||||
- documents 命中 + research/DOI 一致 → 通过
|
||||
- 仅单一来源 → 标"单源,谨慎",提示用户复核
|
||||
- 不同来源字段冲突(年份/卷期不一致)→ 以可验证的 DOI 元数据为准,修正条目
|
||||
|
||||
### 第 3 层 — 支撑度 (claim-faithfulness)
|
||||
|
||||
最容易翻车的一层:文献存在,但**并不支撑你写的那句话**。逐条做:
|
||||
|
||||
1. 抓回该文献的 `md_content`(documents 直接给整篇 Markdown)/ `fetch_xml` / `fetch_pdf`(research)
|
||||
2. 在原文里定位与论断相关的**锚点证据**:一句 ≤25 词的原文引语 + 出现的段落/小节位置
|
||||
3. 判定支撑度三档:
|
||||
- **support**:原文明确支撑该论断 → 通过
|
||||
- **partial / 需限定**:原文只支撑部分,或有前提条件 → **改写论断**使之与证据相符(别让引文背锅)
|
||||
- **not-support / 反向**:原文不支撑甚至相反 → **删除该引用或换文献**;绝不硬挂
|
||||
4. 抓不到全文(无 PDF/XML)→ 至少用 abstract 做弱核验,标"仅摘要核验",提示用户终审时复查
|
||||
|
||||
## 产出:核验台账 `CITATIONS.md`
|
||||
|
||||
在 `<task_dir>/CITATIONS.md` 记一份可复盘的台账(append,一条引文一行):
|
||||
|
||||
```markdown
|
||||
# 引文核验台账
|
||||
> 每条引文的存在性/三角/支撑度核验结果。渲染前所有条目应为 verified 或经用户确认。
|
||||
|
||||
- [1] Provis & Bernal 2014, Annu. Rev. Mater. Res. 44:299 | exists:✓(documents+DOI) | triangulate:✓ | claim:support "geopolymers form via dissolution-polymerisation"(§2.1) | status: verified
|
||||
- [2] <author> <year> ... | exists:✓ | claim:partial → 已把"显著提高"改为"在 28d 提高约 12%" | status: verified-revised
|
||||
- [3] <author> ... | exists:✗ 两库未命中 | status: 待用户提供来源
|
||||
```
|
||||
|
||||
## 与编号流程的衔接
|
||||
|
||||
1. 起草时占位 `[CITE-<keyword>]`(见 cite_*.md)
|
||||
2. 本协议逐条把占位映射到**已核验的真实文献**
|
||||
3. 仅 status=verified / verified-revised / 用户确认 的才进入编号
|
||||
4. 按文中首次出现顺序编 `[1][2]...`,写 `06_references.md`
|
||||
5. quality_check.py 兜底查 orphan/uncited/编号连续
|
||||
|
||||
## 铁律
|
||||
|
||||
- ❌ 任何 status 非 verified 的引文不得带进最终稿(要么核实、要么删论断、要么用户拍板)
|
||||
- ❌ 不得为凑引文数编造"看起来合理"的文献
|
||||
- ✅ 支撑度不足时**改论断迁就证据**,不是改证据迁就论断
|
||||
- ✅ 两库都查不到时如实告诉用户,给出"提供来源 / 删除论断"两个选项
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
# 英文论文引文规范 (Elsevier / IEEE 数字制)
|
||||
|
||||
英文 SCI 材料期刊多用**数字顺序制**(numbered, Vancouver-like):文中 `[1]`,文末按出现顺序排。
|
||||
常见对口期刊:*Cement and Concrete Research*、*Cement and Concrete Composites*、
|
||||
*Construction and Building Materials*、*Journal of the American Ceramic Society*、
|
||||
*Ceramics International*、*Journal of the European Ceramic Society*。
|
||||
**语言=en 时加载本文件;语言=zh 时改用 cite_gbt7714.md。**
|
||||
|
||||
> 不同期刊细节有别(JACerS 用 numbered;部分期刊要 author-year)。**以目标期刊 Guide for Authors 为准**,本文件给最常见的 Elsevier numbered 默认。
|
||||
|
||||
## 真实性铁律
|
||||
|
||||
- ❌ 不可编造 authors / year / journal / volume / pages / DOI
|
||||
- ✅ 引文必须经 `citation_verify.md` 核验(优先 documents / research 库)
|
||||
- ✅ 用户给 BibTeX / RIS 你只排版
|
||||
|
||||
## 文中标注(numbered)
|
||||
|
||||
```
|
||||
The early-age strength of alkali-activated binders is governed by calcium content [1].
|
||||
Provis et al. [2] proposed a nanostructural evolution model for geopolymers.
|
||||
Several studies [3-5] reported similar trends.
|
||||
```
|
||||
|
||||
多篇:`[3-5]` 连续 / `[1,3,7]` 非连续。
|
||||
❌ 不要 author-year `(Provis et al., 2014)`,除非目标期刊明确要求。
|
||||
|
||||
## 文末 References 格式 (Elsevier numbered)
|
||||
|
||||
字段顺序:**Authors, Title, Journal Abbrev. Volume (Year) Pages.**(可带 DOI)
|
||||
作者全列或按期刊要求截断;**期刊名用标准缩写**(ISO 4 / CASSI),与中文刊全称相反。
|
||||
|
||||
| 类型 | 格式示例 |
|
||||
|---|---|
|
||||
| Journal | `[1] J.L. Provis, S.A. Bernal, Geopolymers and related alkali-activated materials, Annu. Rev. Mater. Res. 44 (2014) 299-327. https://doi.org/10.1146/annurev-matsci-070813-113515.` |
|
||||
| Journal | `[2] K. Scrivener, A. Ouzia, P. Juilland, et al., Advances in understanding cement hydration mechanisms, Cem. Concr. Res. 124 (2019) 105823.` |
|
||||
| Book | `[3] H.F.W. Taylor, Cement Chemistry, 2nd ed., Thomas Telford, London, 1997, pp. 113-156.` |
|
||||
| Book chapter | `[4] B. Lothenbach, et al., Thermodynamic modelling, in: Cementitious Materials, De Gruyter, 2018, pp. 53-105.` |
|
||||
| Conference | `[5] K. Scrivener, Hydration of cementitious materials, in: Proc. 13th ICCC, Madrid, 2011, pp. 1-12.` |
|
||||
| Standard | `[6] ASTM C150/C150M-22, Standard Specification for Portland Cement, ASTM International, West Conshohocken, 2022.` |
|
||||
| Thesis | `[7] X. Li, Hydration of high-belite cement (Ph.D. thesis), CBMA, Beijing, 2019.` |
|
||||
| Dataset/Web | `[8] USGS, Mineral Commodity Summaries 2024, 2024. https://... (accessed 1 June 2024).` |
|
||||
|
||||
## IEEE 变体(投 IEEE / 部分材料-器件交叉刊时)
|
||||
|
||||
字段顺序与 Elsevier 接近但标点/缩写规则不同:
|
||||
`[1] J. L. Provis and S. A. Bernal, "Geopolymers and related alkali-activated materials," Annu. Rev. Mater. Res., vol. 44, pp. 299-327, 2014.`
|
||||
题名加引号、卷期用 `vol.`/`pp.`、作者名缩写在前。投 IEEE 系才用,默认走 Elsevier。
|
||||
|
||||
## 写作流程(与 citation_verify 配合)
|
||||
|
||||
1. 起草时引用处放占位 `[CITE-<keyword>]`
|
||||
2. 草稿成形 → 走 `citation_verify.md` 逐条查真实文献
|
||||
3. 按文中**首次出现顺序**编号,替换占位为 `[1][2]...`
|
||||
4. 按目标期刊格式重排 `06_references.md`;期刊名缩写查 CASSI
|
||||
|
||||
## 引文密度 / 常见错误
|
||||
|
||||
- Introduction 15-40、Methods 3-10、Discussion 10-30、Results/Conclusion 0-5
|
||||
- 期刊名**该缩写没缩写**(`Cement and Concrete Research` → `Cem. Concr. Res.`)
|
||||
- 漏 DOI / volume / pages;author-year 与 numbered 混用
|
||||
- orphan cite / uncited ref(quality_check 必拦)
|
||||
|
|
@ -0,0 +1,65 @@
|
|||
# 中文论文引文规范 (GB/T 7714-2015 顺序编码制)
|
||||
|
||||
中文核心期刊(《硅酸盐学报》《建筑材料学报》《硅酸盐通报》等)统一用顺序编码制:
|
||||
文中按出现顺序 `[1][2][3]...`,文末参考文献按编号顺序排列。
|
||||
**语言=zh 时加载本文件;语言=en 时改用 cite_elsevier.md。**
|
||||
|
||||
## 真实性铁律
|
||||
|
||||
- ❌ **不可编造**作者 / 年份 / 期刊 / 卷期 / 页码 / DOI
|
||||
- ❌ **不可凭印象**写"某某 2020 提到过…" —— 大概率错
|
||||
- ✅ 引文必须经 `citation_verify.md` 的核验流程拿到真实条目(优先查 documents / research 库)
|
||||
- ✅ 用户给 BibTeX / EndNote / 纯文本均可,你只**排版**,不补全凭空内容
|
||||
|
||||
## 文中标注
|
||||
|
||||
```
|
||||
碱激发胶凝材料的早期强度发展受钙含量调控 [1]。
|
||||
Provis 等 [2] 提出了地聚物的纳米结构演化模型。
|
||||
```
|
||||
|
||||
多篇:`[1-3]` 连续 / `[1, 3, 5]` 非连续。
|
||||
❌ 不要 APA `(Provis, 2020)`;❌ 不要"张三在文献[1]中提出",写"张三 [1] 提出"。
|
||||
|
||||
## 文末参考文献格式
|
||||
|
||||
字段顺序:**作者. 题名[类型代码]. 出版信息.** 中文用全角符号,英文用半角;
|
||||
3 位以上作者列前 3 位 + ", 等."(中文)/ ", et al."(英文);期刊名用**全称**。
|
||||
|
||||
| 类型 | 格式示例 |
|
||||
|---|---|
|
||||
| 期刊 J | `[1] 沈晓东, 王培铭. 水泥水化动力学研究进展[J]. 硅酸盐学报, 2006, 34(8): 1009-1015.` |
|
||||
| 期刊 J(英文混排) | `[2] PROVIS J L, BERNAL S A. Geopolymers and related alkali-activated materials[J]. Annual Review of Materials Research, 2014, 44: 299-327.` |
|
||||
| 会议 C | `[3] SCRIVENER K. Hydration of cementitious materials[C]//Proc. of the 13th ICCC. Madrid, 2011: 1-12.` |
|
||||
| 学位 D | `[4] 李响. 高贝利特水泥的水化硬化特性研究[D]. 北京: 中国建筑材料科学研究总院, 2019.` |
|
||||
| 专著 M | `[5] 沈威, 黄文熙. 水泥工艺学[M]. 武汉: 武汉理工大学出版社, 2017: 88-95.` |
|
||||
| 标准 S | `[6] 中国建筑材料联合会. 通用硅酸盐水泥: GB 175—2023[S]. 北京: 中国标准出版社, 2023.` |
|
||||
| 专利 P | `[7] 张三, 李四. 一种低碳胶凝材料及其制备方法: CN20231012345.6[P]. 2023-08-15.` |
|
||||
| 网络 EB/OL | `[8] USGS. Mineral commodity summaries 2024[EB/OL]. (2024-01-31)[2024-06-01]. https://...` |
|
||||
|
||||
类型代码:J 期刊 / M 专著 / C 会议 / D 学位 / R 报告 / P 专利 / S 标准 / EB 电子公告 / DS 数据集 / OL 联机网络。
|
||||
|
||||
## 写作流程(与 citation_verify 配合)
|
||||
|
||||
1. 起草正文时,引用处先放占位符 `[CITE-<关键词>]`(如 `[CITE-geopolymer-strength]`)
|
||||
2. 草稿成形后,走 `citation_verify.md`:对每个占位逐条查 documents / research 拿真实文献
|
||||
3. 核验通过的文献按**文中首次出现顺序**编号,占位符替换成 `[1][2][3]...`
|
||||
4. 按 GB/T 7714 重排 `06_references.md`
|
||||
|
||||
> quality_check.py 会拦未替换的 `[CITE-xx]` 占位与 orphan/uncited —— 别带着占位符渲染。
|
||||
|
||||
## 引文密度(材料论文参考)
|
||||
|
||||
| 章节 | 推荐数 |
|
||||
|---|---|
|
||||
| Introduction(背景 + gap) | 15-40 |
|
||||
| Methods(方法/标准出处) | 3-10 |
|
||||
| Discussion(机理对比) | 10-30 |
|
||||
| Results / Conclusion | 0-5 |
|
||||
|
||||
## 常见错误
|
||||
|
||||
- APA `(Smith, 2020)` → 顺序编码 `[1]`
|
||||
- 期刊名缩写 `J. Am. Ceram. Soc.` → 中文刊全称;英文刊按目标期刊要求(中文核心多用全称)
|
||||
- 缺页码 / 缺出版地;中英作者格式混用(中文全角逗号,英文半角)
|
||||
- 文中引了 `[5]` 但参考文献清单没有第 5 条(orphan cite,quality_check 必拦)
|
||||
|
|
@ -0,0 +1,70 @@
|
|||
# 论文类型 cheat sheet
|
||||
|
||||
写之前先确认**论文类型**(决定章节骨架与篇幅预算)和**语言**(决定引文格式与硬规则)。
|
||||
章节文件按下表命名(NN_ 前缀决定 sections/ 里的排序,也是 quality_check / word_count 识别依据)。
|
||||
|
||||
适用学科:无机非金属材料(水泥 / 混凝土 / 玻璃 / 陶瓷 / 耐火 / 新型建材)。
|
||||
篇幅预算 en 口径=词数,zh 口径=字数(摘要除外,见各类说明)。
|
||||
|
||||
---
|
||||
|
||||
## 1. 原创研究论文 (`original`) —— IMRaD
|
||||
|
||||
绝大多数实验性论文走这个。**英文 SCI 4000-7000 词 / 中文核心 6000-10000 字**(不含图表与参考文献)。
|
||||
|
||||
| 文件 | 章节 | 内容要点 | en 词 | zh 字 |
|
||||
|---|---|---|---|---|
|
||||
| `00_title_abstract.md` | Title / Abstract / Keywords | 结论式题名;结构化或叙述式摘要(背景-方法-结果-结论);3-6 关键词 | 150-320 | 200-400 |
|
||||
| `01_introduction.md` | Introduction | 漏斗式:领域背景 → 已有工作与 gap → 本文目的与贡献(末段点明) | 600-1000 | 1000-1800 |
|
||||
| `02_methods.md` | Materials and Methods | 原材料(来源/纯度/配比)→ 制备工艺(温度/时间/升温制度)→ 表征方法(仪器型号/参数/标准)。**可复现是铁律** | 800-1600 | 1200-2400 |
|
||||
| `03_results.md` | Results | 按逻辑(不按时间)摆数据;每个图表对应 1-2 段;**只陈述不解读** | 1000-1900 | 1500-2900 |
|
||||
| `04_discussion.md` | Discussion | 解读机理 / 与文献对比 / 局限。回答 Introduction 提的问题。**不重复 Results 的数字** | 800-1600 | 1200-2400 |
|
||||
| `05_conclusion.md` | Conclusion | 3-5 条结论 + 定量关键数据 + 展望(可选) | 180-400 | 300-600 |
|
||||
| `06_references.md` | References | 顺序编码制 `[1][2]...`,格式见 cite_*.md | — | — |
|
||||
|
||||
可选附加(按需,放对应位置):Graphical Abstract / Highlights(Elsevier 系常要)/ Acknowledgments / CRediT 作者贡献 / Declaration of competing interest / Data availability。
|
||||
|
||||
**章节顺序写作建议(不是文件顺序)**:Methods → Results → Introduction → Discussion → Abstract → Title。
|
||||
先写定事实(怎么做、得到什么),再写需要全局视野的部分(为什么重要、怎么解读),最后用全文凝练摘要与题名 —— 这是论文写作公认更稳的顺序,避免开头写 Introduction 时把方向带偏。
|
||||
|
||||
---
|
||||
|
||||
## 2. 综述论文 (`review`) —— 主题式
|
||||
|
||||
按主题(thematic)组织,不是 IMRaD。**英文 6000-12000 词 / 中文 8000-15000 字**。
|
||||
|
||||
| 文件 | 章节 | en 词 | zh 字 |
|
||||
|---|---|---|---|
|
||||
| `00_title_abstract.md` | Title / Abstract / Keywords | 180-350 | 250-450 |
|
||||
| `01_introduction.md` | Introduction(领域意义 + 综述范围 + 本文组织) | 700-1300 | 1200-2200 |
|
||||
| `02_<theme>.md` ~ `NN_<theme>.md` | 主题章节(按材料体系 / 性能维度 / 方法路线切,每章一个主题,**篇数可变**,word_count 标 no budget) | 自定 | 自定 |
|
||||
| `98_outlook.md` | Challenges & Outlook(未解难题 + 趋势) | 500-1100 | 800-1800 |
|
||||
| `99_conclusion.md` | Conclusion | 180-450 | 300-700 |
|
||||
| `<NN>_references.md` | References(综述引文常 80-200 条) | — | — |
|
||||
|
||||
> 主题章节文件名用 `02_` 起的两位前缀保证排序,主题名跟在后面(如 `02_hydration_mechanism.md`)。`98_/99_` 前缀保证 outlook/conclusion 永远在主题章之后。
|
||||
|
||||
---
|
||||
|
||||
## 3. 快报 / 通讯 (`letter`) —— 凝练版
|
||||
|
||||
Communication / Letter,篇幅小、强调新颖性与时效。**英文 1500-3000 词 / 中文 2000-4000 字**。
|
||||
|
||||
| 文件 | 章节 | en 词 | zh 字 |
|
||||
|---|---|---|---|
|
||||
| `00_title_abstract.md` | Title / Abstract | 120-250 | 180-350 |
|
||||
| `01_main.md` | 正文(引言+方法+结果+讨论压缩成连续叙述,不分大节) | 1500-3000 | 2000-4000 |
|
||||
| `02_references.md` | References | — | — |
|
||||
|
||||
---
|
||||
|
||||
## 选择决策树
|
||||
|
||||
```
|
||||
要系统报告一组实验(材料-工艺-性能-机理)? — 是 → original
|
||||
否 → 要综述某方向已有研究、不含自己新实验? — 是 → review
|
||||
否 → 有一个新颖、时效强、单点突破的结果想快发? — 是 → letter
|
||||
否 → 跟用户确认论文类型, 不要猜
|
||||
```
|
||||
|
||||
不确定时默认 `original`(覆盖面最广)。投稿目标期刊有特定体裁要求(如 Highlights 必填、字数硬上限)时,以**目标期刊投稿指南**为准,本表只给通用骨架。
|
||||
|
|
@ -0,0 +1,40 @@
|
|||
# English manuscript writing rules (loaded when language=en)
|
||||
|
||||
For materials-science SCI submissions. Reviewers penalize violations even when quality_check doesn't catch them all.
|
||||
|
||||
## Title / Abstract / Keywords
|
||||
|
||||
- **Title**: state what was done + system + key finding; avoid "Study on…/Investigation of…" filler; minimize abbreviations.
|
||||
- **Abstract**: 150-320 words, **self-contained**, covering background/aim, methods, **quantitative** results, conclusion. **No citation markers [n]** and **no figure/table numbers** inside the abstract. Structured abstract only if the journal requires it.
|
||||
- **Keywords**: 3-6, complementary to the title (don't just repeat it); cover material system + method + property.
|
||||
|
||||
## Per-section rules
|
||||
|
||||
- **Introduction**: funnel structure; the final paragraph must state the aim and contribution of this work. Each paragraph makes a point — not a citation list.
|
||||
- **Materials and Methods**: **reproducibility is mandatory** — raw material source/purity/composition, mix proportions (with values + units), processing (temperature/time/heating-cooling schedule/atmosphere), characterization instrument models + parameters + standards (ASTM/EN/ISO). Never "by conventional method".
|
||||
- **Results**: **report, don't interpret** (interpretation belongs to Discussion). Every figure/table is cited and described in text; data carry **units + uncertainty/SD**; highlight key numbers, don't restate every value in a table.
|
||||
- **Discussion**: mechanism, comparison with literature (cited), limitations; **do not restate Results numbers**; answer the question posed in the Introduction.
|
||||
- **Conclusion**: itemized + quantitative; introduce no new data; outlook optional, not vague.
|
||||
|
||||
## Language & style
|
||||
|
||||
- **Tense**: Methods & Results in past tense (what you did/found); established facts and conclusions in present tense.
|
||||
- **Voice**: active voice is increasingly accepted ("We measured…"); keep it consistent; avoid dangling modifiers.
|
||||
- Terminology consistent throughout (one concept = one term). Define abbreviations at first use: C-S-H, C₃S, AFt (ettringite), etc.
|
||||
- SI units and standard symbols (MPa, °C, wt%, mol/L); correct spacing between number and unit.
|
||||
- **No overclaiming**: "world-first / unprecedented / groundbreaking / state-of-the-art" without evidence (quality_check flags these). Let data speak.
|
||||
- Figures: cite before showing; caption below figure, title above table; axes labelled with units; **no ASCII art** (use mermaid or matplotlib PNG).
|
||||
- Watch common L2 issues: article use (a/the), subject-verb agreement, "respectively" placement, comma splices.
|
||||
|
||||
## Research integrity (hard rules)
|
||||
|
||||
- **No fabrication / no result beautification / no selective reporting**; report negative results honestly.
|
||||
- **Citations must be real** and verified via `citation_verify.md`; no plagiarism, no duplicate submission.
|
||||
- Disclose AI-assisted writing if the target journal requires it.
|
||||
- Funding / ethics / competing interests as required; mark `<TODO>` when user input is needed.
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
- "Study on…" title with no finding / [n] markers inside abstract / interpreting mechanism in Results
|
||||
- Methods saying "conventional/appropriate/certain temperature" (not reproducible) / Discussion restating Results numbers
|
||||
- Fabricated data or citations / rendering while `[CITE-xx]` placeholders remain
|
||||
|
|
@ -0,0 +1,38 @@
|
|||
# 中文论文写作硬规则 (language=zh 时加载)
|
||||
|
||||
材料类中文核心期刊投稿。违反这些 quality_check 不一定全拦,但审稿人会扣分。
|
||||
|
||||
## 题名 / 摘要 / 关键词
|
||||
|
||||
- **题名**:写清"做了什么 + 体系 + 关键发现",≤25 字,不用"研究""探讨"凑字(如"……的研究"信息量低)。少用缩写,首字母缩略词题名里慎用。
|
||||
- **摘要**:200-400 字,**自含**(不依赖正文也能读懂),含 背景目的 / 方法 / **定量**结果 / 结论 四要素。摘要里**不出现引文标注 [n]**,不出现图表编号。
|
||||
- **关键词**:3-6 个,覆盖材料体系 + 方法 + 性能维度,避免与题名完全重复。
|
||||
|
||||
## 各章红线
|
||||
|
||||
- **引言**:漏斗式收口,末段必须点明本文目的与创新点;不堆砌文献流水账,每段要有论点。
|
||||
- **实验/方法**:**可复现是铁律** —— 原材料来源/纯度/化学组成、配比(给具体数值与单位)、制备工艺(温度/时间/升温降温制度/气氛)、表征仪器型号+测试参数+依据标准(GB/T、ASTM 等)。别写"按常规方法"。
|
||||
- **结果**:**只陈述,不解读**(解读留给讨论)。每个图/表都要在正文被引用并描述趋势;数据带**单位 + 误差/标准差**;不重复图表里已有的全部数字,挑关键的说。
|
||||
- **讨论**:解释机理、与文献对比(用引文)、说明局限;**不重复结果的数字**;回答引言提出的科学问题。
|
||||
- **结论**:分条 + 定量;不引入新数据;展望可选但别空泛。
|
||||
|
||||
## 语言与表达
|
||||
|
||||
- 学术语体,**不用口语 / 不用感叹**;术语全文统一(一个概念一个词,别中途换说法)。
|
||||
- 量与单位用法定计量单位 + 国标符号(MPa、°C、wt%、mol/L);数字与单位间规范空格。
|
||||
- 化学式 / 相名规范:C-S-H、C₃S、钙矾石(AFt)等首次出现给全称 + 缩写,之后用缩写。
|
||||
- **过度宣称禁用**:"国际领先""首次""填补空白""重大突破"等无证据夸张词(quality_check 会拦);用数据说话。
|
||||
- 图表:图随文走、先文后图;图题在图下、表题在表上;坐标轴带量纲;**不用 ASCII 字符画**(用 mermaid 或 matplotlib 出 png)。
|
||||
|
||||
## 学术诚信(铁律)
|
||||
|
||||
- **不编造数据 / 不美化结果 / 不选择性报告**;阴性结果如实写。
|
||||
- **引文必须真实**且经 `citation_verify.md` 核验;不抄袭、不一稿多投。
|
||||
- 使用 AI 辅助写作如目标期刊要求披露,则在 Acknowledgments / 声明中如实说明。
|
||||
- 基金号 / 伦理审批 / 利益冲突 按期刊要求如实填,缺则标 `<TODO>` 待用户提供。
|
||||
|
||||
## 反模式速查
|
||||
|
||||
- 题名带"研究""探讨"且无具体发现 / 摘要里出现 [n] 引文 / 结果章大段解读机理
|
||||
- 方法写"按常规""适量""一定温度"(不可复现)/ 讨论重复结果数字
|
||||
- 自造数据或指标 / 引文凭印象写 / 带 `[CITE-xx]` 占位就渲染
|
||||
|
|
@ -0,0 +1,305 @@
|
|||
"""论文投稿稿质量检查 — 渲染 docx 前跑一遍。
|
||||
|
||||
检查项:
|
||||
- 结构完整性: 论文类型必备章节是否齐全
|
||||
- 占位符泄漏: <TODO> / [REF-xx] / [CITE-xx] / (Author, year) 占位是否还在
|
||||
- 过度宣称: "国际领先 / 首次 / world-first / unprecedented" 等无证据夸张词
|
||||
- 插图: figures/ 有 png 但 sections 无 ![]() 引用; 代码块 ASCII 字符画; mermaid 缺 caption / 撞名
|
||||
- **引文交叉核对** (论文版核心): 文中 [n] 与文末参考文献清单互查
|
||||
· orphan cite: 文中引了 [7] 但参考文献列表没有第 7 条
|
||||
· uncited ref: 参考文献列了第 9 条但正文从没引用
|
||||
· 编号不连续 / 不从 1 起 (顺序编码制要求按首次出现顺序连续编号)
|
||||
|
||||
用法:
|
||||
python quality_check.py <sections_dir> --type original
|
||||
python quality_check.py <sections_dir> --type original --strict
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
REQUIRED_SECTIONS: dict[str, list[str]] = {
|
||||
"original": [
|
||||
"00_title_abstract", "01_introduction", "02_methods",
|
||||
"03_results", "04_discussion", "05_conclusion", "06_references",
|
||||
],
|
||||
# 综述: title/abstract + intro + (≥1 个 thematic 主体, 不强制命名) + outlook/conclusion + references
|
||||
"review": ["00_title_abstract", "01_introduction", "99_conclusion", "references"],
|
||||
"letter": ["00_title_abstract", "01_main", "references"],
|
||||
}
|
||||
|
||||
|
||||
# 过度宣称 / 无证据夸张 (中英)
|
||||
OVERCLAIM_PHRASES = [
|
||||
"国际领先", "国际一流", "世界领先", "世界一流", "填补空白", "首次提出",
|
||||
"重大突破", "划时代", "前所未有",
|
||||
"world-first", "world-leading", "unprecedented", "groundbreaking",
|
||||
"revolutionary", "first-ever", "state of the art", "best-in-class",
|
||||
]
|
||||
PLACEHOLDER_PATTERNS = [
|
||||
r"<TODO[^>]*>",
|
||||
r"\[REF-[A-Za-z0-9]+\]",
|
||||
r"\[CITE-[A-Za-z0-9]+\]",
|
||||
r"\[Smith et al",
|
||||
r"\(Author,?\s*\d{4}\)", # APA 占位 (Author, 2024)
|
||||
r"\bXX+\b", # XX / XXX 占位
|
||||
]
|
||||
|
||||
|
||||
# 插图相关 (同 proposal)
|
||||
_BOX_DRAWING_RE = re.compile(r"[┌┐└┘├┤┬┴┼─│╔╗╚╝╠╣╦╩╬═║▲▼◀▶]")
|
||||
_IMAGE_REF_RE = re.compile(r"!\[[^\]]*\]\([^)\s]+\)")
|
||||
_FENCE_RE = re.compile(r"^\s*(`{3,}|~{3,})\s*(\S*)\s*$")
|
||||
_MERMAID_CAPTION_RE = re.compile(r"^\s*%%\s*caption\s*:\s*(.+?)\s*$", re.IGNORECASE)
|
||||
|
||||
# 文中引文标记: [7] / [7-9] / [7, 9] / [7,9-11]
|
||||
_INTEXT_CITE_RE = re.compile(r"\[(\d[\d,\s\-]*)\]")
|
||||
# 参考文献条目行: 以 [n] 开头
|
||||
_REF_ENTRY_RE = re.compile(r"^\s*\[(\d+)\]")
|
||||
|
||||
|
||||
def _is_references_file(stem: str) -> bool:
|
||||
s = stem.lower()
|
||||
return "reference" in s or s.endswith("_refs") or "参考文献" in stem
|
||||
|
||||
|
||||
def _extract_mermaid_caption(block_lines: list[str]) -> str | None:
|
||||
for ln in block_lines:
|
||||
m = _MERMAID_CAPTION_RE.match(ln)
|
||||
if m:
|
||||
return m.group(1).strip()
|
||||
return None
|
||||
|
||||
|
||||
def check_structure(sections_dir: Path, ptype: str) -> list[str]:
|
||||
required = REQUIRED_SECTIONS.get(ptype, [])
|
||||
existing = {f.stem for f in sections_dir.glob("*.md")}
|
||||
issues = []
|
||||
for req in required:
|
||||
if req == "references":
|
||||
if not any(_is_references_file(s) for s in existing):
|
||||
issues.append("缺章节: references (参考文献)")
|
||||
continue
|
||||
if not any(s.startswith(req) for s in existing):
|
||||
issues.append(f"缺章节: {req}")
|
||||
return issues
|
||||
|
||||
|
||||
def check_phrases(text: str, label: str) -> list[str]:
|
||||
issues = []
|
||||
low = text.lower()
|
||||
for phrase in OVERCLAIM_PHRASES:
|
||||
hit = phrase in text or phrase.lower() in low
|
||||
if hit:
|
||||
issues.append(f"[{label}] 过度宣称: '{phrase}' — 换成可被数据支撑的具体表述")
|
||||
return issues
|
||||
|
||||
|
||||
def check_placeholders(text: str, label: str) -> list[str]:
|
||||
issues = []
|
||||
for pat in PLACEHOLDER_PATTERNS:
|
||||
for m in re.findall(pat, text):
|
||||
issues.append(f"[{label}] 占位符未替换: '{m}'")
|
||||
return issues
|
||||
|
||||
|
||||
def _expand_cite_group(grp: str) -> set[int]:
|
||||
"""'7, 9-11' -> {7,9,10,11}。非法片段忽略。"""
|
||||
out: set[int] = set()
|
||||
for part in grp.split(","):
|
||||
part = part.strip()
|
||||
if not part:
|
||||
continue
|
||||
if "-" in part:
|
||||
a, _, b = part.partition("-")
|
||||
try:
|
||||
lo, hi = int(a), int(b)
|
||||
except ValueError:
|
||||
continue
|
||||
if 0 < lo <= hi <= 999:
|
||||
out.update(range(lo, hi + 1))
|
||||
else:
|
||||
try:
|
||||
out.add(int(part))
|
||||
except ValueError:
|
||||
continue
|
||||
return out
|
||||
|
||||
|
||||
def check_citations(sections_dir: Path) -> list[str]:
|
||||
"""文中 [n] 与参考文献清单 [n] 互查。"""
|
||||
issues: list[str] = []
|
||||
cited: set[int] = set()
|
||||
ref_nums: list[int] = []
|
||||
|
||||
for md in sorted(sections_dir.glob("*.md")):
|
||||
text = md.read_text(encoding="utf-8")
|
||||
if _is_references_file(md.stem):
|
||||
for ln in text.splitlines():
|
||||
m = _REF_ENTRY_RE.match(ln)
|
||||
if m:
|
||||
ref_nums.append(int(m.group(1)))
|
||||
else:
|
||||
for grp in _INTEXT_CITE_RE.findall(text):
|
||||
cited.update(_expand_cite_group(grp))
|
||||
|
||||
if not ref_nums and not cited:
|
||||
return ["未发现任何引文 (文中 [n] 和参考文献清单都为空) — 论文一般需要引用支撑"]
|
||||
|
||||
ref_set = set(ref_nums)
|
||||
|
||||
# orphan cite: 引了但参考文献没有
|
||||
orphan = sorted(cited - ref_set)
|
||||
if orphan:
|
||||
issues.append(f"orphan cite — 文中引了 {orphan} 但参考文献清单缺对应条目 (编造/漏排)")
|
||||
|
||||
# uncited ref: 列了但正文从没引
|
||||
uncited = sorted(ref_set - cited)
|
||||
if uncited:
|
||||
issues.append(f"uncited ref — 参考文献第 {uncited} 条正文从未引用 (删除或在正文补引)")
|
||||
|
||||
# 编号重复
|
||||
dups = sorted({n for n in ref_nums if ref_nums.count(n) > 1})
|
||||
if dups:
|
||||
issues.append(f"参考文献编号重复: {dups}")
|
||||
|
||||
# 连续性: 应从 1 起连续
|
||||
if ref_set:
|
||||
expected = set(range(1, max(ref_set) + 1))
|
||||
gaps = sorted(expected - ref_set)
|
||||
if gaps:
|
||||
issues.append(f"参考文献编号不连续, 缺号: {gaps} (顺序编码制需 1..N 连续)")
|
||||
if 1 not in ref_set:
|
||||
issues.append("参考文献编号未从 [1] 起")
|
||||
|
||||
return issues
|
||||
|
||||
|
||||
def check_figures(sections_dir: Path) -> list[str]:
|
||||
issues: list[str] = []
|
||||
figures_dir = sections_dir.parent / "figures"
|
||||
pngs = list(figures_dir.glob("*.png")) if figures_dir.is_dir() else []
|
||||
|
||||
total_img_refs = 0
|
||||
ascii_art_blocks: list[tuple[str, int]] = []
|
||||
mermaid_no_caption: list[tuple[str, int]] = []
|
||||
mermaid_captions: dict[str, list[str]] = {}
|
||||
|
||||
for md in sorted(sections_dir.glob("*.md")):
|
||||
text = md.read_text(encoding="utf-8")
|
||||
total_img_refs += len(_IMAGE_REF_RE.findall(text))
|
||||
lines = text.splitlines()
|
||||
i = 0
|
||||
while i < len(lines):
|
||||
m = _FENCE_RE.match(lines[i])
|
||||
if not m:
|
||||
i += 1
|
||||
continue
|
||||
fence = m.group(1)
|
||||
lang = (m.group(2) or "").lower()
|
||||
block_line = i + 1
|
||||
i += 1
|
||||
buf: list[str] = []
|
||||
while i < len(lines):
|
||||
mc = _FENCE_RE.match(lines[i])
|
||||
if mc and mc.group(1)[0] == fence[0] and len(mc.group(1)) >= len(fence):
|
||||
i += 1
|
||||
break
|
||||
buf.append(lines[i])
|
||||
i += 1
|
||||
if lang == "mermaid":
|
||||
cap = _extract_mermaid_caption(buf)
|
||||
if not cap:
|
||||
mermaid_no_caption.append((md.name, block_line))
|
||||
else:
|
||||
mermaid_captions.setdefault(cap, []).append(f"{md.name}:{block_line}")
|
||||
continue
|
||||
if any(_BOX_DRAWING_RE.search(ln) for ln in buf):
|
||||
ascii_art_blocks.append((md.name, block_line))
|
||||
|
||||
if pngs and total_img_refs == 0:
|
||||
names = ", ".join(p.name for p in pngs[:4])
|
||||
more = f" ... +{len(pngs) - 4}" if len(pngs) > 4 else ""
|
||||
issues.append(f"figures/ 有 {len(pngs)} 张 png ({names}{more}) 但 sections 里 0 个  引用")
|
||||
for fname, lineno in ascii_art_blocks:
|
||||
issues.append(f"[{fname}:~{lineno}] 代码块里有 ASCII 字符画 — Word 必错位, 改 ```mermaid 块或 ")
|
||||
for fname, lineno in mermaid_no_caption:
|
||||
issues.append(f"[{fname}:~{lineno}] mermaid 块缺首行 '%% caption: <图题>'")
|
||||
for cap, locs in mermaid_captions.items():
|
||||
if len(locs) > 1:
|
||||
issues.append(f"mermaid caption 撞名: {cap!r} 出现在 {', '.join(locs)}")
|
||||
return issues
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description="论文质量检查")
|
||||
ap.add_argument("sections_dir", type=Path)
|
||||
ap.add_argument("--type", required=True, choices=list(REQUIRED_SECTIONS.keys()))
|
||||
ap.add_argument("--strict", action="store_true", help="严格模式: 任何问题退出 1")
|
||||
args = ap.parse_args()
|
||||
|
||||
if not args.sections_dir.is_dir():
|
||||
print(f"[ERR] {args.sections_dir} not a directory", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
print(f"\n[质量检查] type={args.type}\n")
|
||||
all_issues: list[str] = []
|
||||
|
||||
struct = check_structure(args.sections_dir, args.type)
|
||||
if struct:
|
||||
print("[ERR] 结构问题:")
|
||||
for s in struct:
|
||||
print(f" - {s}")
|
||||
all_issues.extend(struct)
|
||||
else:
|
||||
print("[OK] 结构完整")
|
||||
|
||||
files = sorted(args.sections_dir.glob("*.md"))
|
||||
print(f"\n共 {len(files)} 个章节, 逐章扫描 (过度宣称 / 占位符)...\n")
|
||||
for f in files:
|
||||
text = f.read_text(encoding="utf-8")
|
||||
sub = check_phrases(text, f.stem) + check_placeholders(text, f.stem)
|
||||
if sub:
|
||||
print(f"[WARN] {f.stem}:")
|
||||
for s in sub:
|
||||
print(f" - {s.split('] ', 1)[1] if '] ' in s else s}")
|
||||
all_issues.extend(sub)
|
||||
|
||||
cite_issues = check_citations(args.sections_dir)
|
||||
if cite_issues:
|
||||
print("\n[ERR] 引文交叉核对:")
|
||||
for s in cite_issues:
|
||||
print(f" - {s}")
|
||||
all_issues.extend(cite_issues)
|
||||
else:
|
||||
print("\n[OK] 引文 [n] 与参考文献清单一致 (无 orphan / uncited, 编号连续)")
|
||||
|
||||
fig_issues = check_figures(args.sections_dir)
|
||||
if fig_issues:
|
||||
print("\n[ERR] 插图问题:")
|
||||
for s in fig_issues:
|
||||
print(f" - {s}")
|
||||
all_issues.extend(fig_issues)
|
||||
else:
|
||||
print("\n[OK] 插图引用 / 无 ASCII 字符画")
|
||||
|
||||
print("\n" + "=" * 60)
|
||||
if all_issues:
|
||||
print(f"[WARN] 共发现 {len(all_issues)} 个问题。")
|
||||
print("\n建议:")
|
||||
print(" - 过度宣称 -> 换成数据支撑的具体表述")
|
||||
print(" - 占位符未替换 -> 补真实数据 / 真实引文")
|
||||
print(" - orphan cite -> 核对参考文献清单 (大概率编造引文, 走 citation_verify 三角核验)")
|
||||
print(" - uncited ref -> 删条目或在正文补引")
|
||||
print(" - 插图未挂 / ASCII 字符画 -> ```mermaid 块或 ")
|
||||
if args.strict:
|
||||
sys.exit(1)
|
||||
else:
|
||||
print("[OK] 全部检查通过, 可以渲染 docx 了。")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -0,0 +1,214 @@
|
|||
"""预处理 sections/*.md 里的 mermaid 块 → 渲染为 figures/fig_<caption>.png。
|
||||
|
||||
与 proposal skill 的 render_diagrams.py 同源 —— 论文里的技术流程图 / 实验装置
|
||||
示意 / 机理图同样用 mermaid 写, 本脚本统一渲成 PNG, render_docx.py 按 caption 查表插图。
|
||||
|
||||
caption 命名规则:
|
||||
- 每个 mermaid 块**必须**有首行注释 `%% caption: <图题>`, 否则直接报错退出
|
||||
- caption 在全 task 内必须唯一, 撞了就报错 (强制起更具体的题)
|
||||
- 文件名 = caption 清洗后 (保留 CJK / 字母 / 数字, 其它字符 → '_', 截 40 字), 前缀 'fig_'
|
||||
|
||||
渲染后端 (按优先级): 本地 mmdc → mermaid.ink 公网 API。两种都没留警告退出 0。
|
||||
|
||||
用法:
|
||||
python render_diagrams.py <task_dir>/sections/
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import base64
|
||||
import re
|
||||
import shutil
|
||||
import subprocess
|
||||
import sys
|
||||
import tempfile
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
|
||||
_FENCE_OPEN_RE = re.compile(r"^\s*```\s*mermaid\s*$")
|
||||
_FENCE_CLOSE_RE = re.compile(r"^\s*```\s*$")
|
||||
_CAPTION_RE = re.compile(r"^\s*%%\s*caption\s*:\s*(.+?)\s*$", re.IGNORECASE)
|
||||
_FILENAME_INVALID_RE = re.compile(r"[^一-鿿A-Za-z0-9]+")
|
||||
|
||||
MERMAID_INK_URL = "https://mermaid.ink/img/{payload}?type=png&bgColor=FFFFFF"
|
||||
|
||||
|
||||
def caption_to_stem(caption: str) -> str:
|
||||
cleaned = _FILENAME_INVALID_RE.sub("_", caption).strip("_")[:40]
|
||||
if not cleaned:
|
||||
raise ValueError(f"caption sanitizes to empty: {caption!r}")
|
||||
return f"fig_{cleaned}"
|
||||
|
||||
|
||||
def extract_caption(source: str) -> str | None:
|
||||
for ln in source.splitlines():
|
||||
m = _CAPTION_RE.match(ln)
|
||||
if m:
|
||||
return m.group(1).strip()
|
||||
return None
|
||||
|
||||
|
||||
def find_mermaid_blocks(md_text: str) -> list[str]:
|
||||
blocks: list[str] = []
|
||||
lines = md_text.splitlines()
|
||||
i = 0
|
||||
n = len(lines)
|
||||
while i < n:
|
||||
if _FENCE_OPEN_RE.match(lines[i]):
|
||||
buf: list[str] = []
|
||||
i += 1
|
||||
while i < n and not _FENCE_CLOSE_RE.match(lines[i]):
|
||||
buf.append(lines[i])
|
||||
i += 1
|
||||
blocks.append("\n".join(buf))
|
||||
i += 1
|
||||
else:
|
||||
i += 1
|
||||
return blocks
|
||||
|
||||
|
||||
def render_via_mmdc(source: str, out_png: Path) -> bool:
|
||||
import os
|
||||
mmdc = shutil.which("mmdc")
|
||||
if not mmdc:
|
||||
return False
|
||||
with tempfile.NamedTemporaryFile("w", suffix=".mmd", delete=False, encoding="utf-8") as tf:
|
||||
tf.write(source)
|
||||
tmp_path = Path(tf.name)
|
||||
try:
|
||||
argv = [mmdc, "-i", str(tmp_path), "-o", str(out_png), "-b", "white", "--quiet"]
|
||||
puppeteer_cfg = os.environ.get("MERMAID_PUPPETEER_CONFIG", "").strip()
|
||||
if puppeteer_cfg and Path(puppeteer_cfg).is_file():
|
||||
argv += ["-p", puppeteer_cfg]
|
||||
proc = subprocess.run(argv, capture_output=True, text=True, timeout=60)
|
||||
if proc.returncode != 0:
|
||||
print(f" [mmdc] returncode={proc.returncode}: {proc.stderr.strip()[:200]}", file=sys.stderr)
|
||||
return False
|
||||
return out_png.exists()
|
||||
except (subprocess.TimeoutExpired, OSError) as e:
|
||||
print(f" [mmdc] error: {e}", file=sys.stderr)
|
||||
return False
|
||||
finally:
|
||||
try:
|
||||
tmp_path.unlink()
|
||||
except OSError:
|
||||
pass
|
||||
|
||||
|
||||
def render_via_mermaid_ink(source: str, out_png: Path) -> bool:
|
||||
payload = base64.urlsafe_b64encode(source.strip().encode("utf-8")).decode("ascii").rstrip("=")
|
||||
url = MERMAID_INK_URL.format(payload=payload)
|
||||
try:
|
||||
req = urllib.request.Request(url, headers={"User-Agent": "zcbot-paper/1.0"})
|
||||
with urllib.request.urlopen(req, timeout=30) as resp:
|
||||
if resp.status != 200:
|
||||
print(f" [mermaid.ink] HTTP {resp.status}", file=sys.stderr)
|
||||
return False
|
||||
data = resp.read()
|
||||
if not data or len(data) < 100:
|
||||
print(f" [mermaid.ink] payload too small ({len(data)} bytes)", file=sys.stderr)
|
||||
return False
|
||||
out_png.write_bytes(data)
|
||||
return True
|
||||
except (urllib.error.URLError, urllib.error.HTTPError, TimeoutError, OSError) as e:
|
||||
print(f" [mermaid.ink] error: {e}", file=sys.stderr)
|
||||
return False
|
||||
|
||||
|
||||
def render_one(source: str, out_png: Path) -> str:
|
||||
if render_via_mmdc(source, out_png):
|
||||
return "mmdc"
|
||||
if render_via_mermaid_ink(source, out_png):
|
||||
return "mermaid.ink"
|
||||
return "fail"
|
||||
|
||||
|
||||
def render_sections(sections_dir: Path) -> int:
|
||||
if not sections_dir.is_dir():
|
||||
print(f"[ERR] sections dir not found: {sections_dir}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
figures_dir = sections_dir.parent / "figures"
|
||||
figures_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
md_files = sorted(sections_dir.glob("*.md"))
|
||||
if not md_files:
|
||||
print(f"[ERR] no .md found in {sections_dir}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
blocks_meta: list[tuple[Path, str, str]] = []
|
||||
missing_cap: list[Path] = []
|
||||
for md in md_files:
|
||||
text = md.read_text(encoding="utf-8")
|
||||
for src in find_mermaid_blocks(text):
|
||||
cap = extract_caption(src)
|
||||
if not cap:
|
||||
missing_cap.append(md)
|
||||
continue
|
||||
blocks_meta.append((md, cap, src))
|
||||
|
||||
fatal = False
|
||||
if missing_cap:
|
||||
print("[ERR] 以下 md 里有 mermaid 块缺首行 '%% caption: <图题>':", file=sys.stderr)
|
||||
for md in missing_cap:
|
||||
print(f" - {md.name}", file=sys.stderr)
|
||||
fatal = True
|
||||
|
||||
by_cap: dict[str, list[str]] = defaultdict(list)
|
||||
for md, cap, _ in blocks_meta:
|
||||
by_cap[cap].append(md.name)
|
||||
dups = [(c, mds) for c, mds in by_cap.items() if len(mds) > 1]
|
||||
if dups:
|
||||
print("[ERR] caption 在全 task 内必须唯一, 以下撞名:", file=sys.stderr)
|
||||
for c, mds in dups:
|
||||
print(f" - {c!r} 出现在: {', '.join(mds)}", file=sys.stderr)
|
||||
fatal = True
|
||||
|
||||
if fatal:
|
||||
return 2
|
||||
|
||||
if not blocks_meta:
|
||||
print(f"[OK] no mermaid block found in {sections_dir} (nothing to render)")
|
||||
return 0
|
||||
|
||||
by_backend: dict[str, int] = {}
|
||||
fail_blocks: list[tuple[Path, str]] = []
|
||||
for md, cap, src in blocks_meta:
|
||||
try:
|
||||
stem = caption_to_stem(cap)
|
||||
except ValueError as e:
|
||||
print(f"[ERR] {md.name}: {e}", file=sys.stderr)
|
||||
return 2
|
||||
png = figures_dir / f"{stem}.png"
|
||||
backend = render_one(src, png)
|
||||
by_backend[backend] = by_backend.get(backend, 0) + 1
|
||||
mark = {"mmdc": "+", "mermaid.ink": "+", "fail": "x"}[backend]
|
||||
print(f" {mark} [{backend:11s}] {md.name} :: {png.name} :: {cap}")
|
||||
if backend == "fail":
|
||||
fail_blocks.append((md, cap))
|
||||
|
||||
print()
|
||||
print(f"[OK] processed {len(blocks_meta)} mermaid block(s) -> {figures_dir}")
|
||||
for b, c in sorted(by_backend.items()):
|
||||
print(f" {b}: {c}")
|
||||
|
||||
if fail_blocks:
|
||||
print()
|
||||
print(f"[WARN] {len(fail_blocks)} block(s) failed to render. render_docx.py 会走 ASCII fallback.")
|
||||
for md, cap in fail_blocks:
|
||||
print(f" - {md.name} :: {cap}")
|
||||
|
||||
return 0
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description="预处理 sections/*.md 的 mermaid 块 → figures/*.png")
|
||||
ap.add_argument("sections_dir", type=Path, help="sections/*.md 目录")
|
||||
args = ap.parse_args()
|
||||
sys.exit(render_sections(args.sections_dir))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -0,0 +1,502 @@
|
|||
"""把 sections/*.md 渲染成期刊投稿稿 .docx (manuscript draft)。
|
||||
|
||||
与 proposal/render_docx.py 同源, 差异:
|
||||
- 无 fund-type; 改用 --lang {zh,en} (默认 en) 标注语言, 仅影响信息打印与首行缩进策略
|
||||
- 目录 (TOC) 默认**不生成** (期刊投稿稿无需目录); 要草稿带目录加 --toc
|
||||
- 字体规范保持: 中文宋体小四 / 英文 Times New Roman 小四 / 行距 1.5 / 首行缩进 2 字符
|
||||
(eastAsia=宋体 只对 CJK 字符生效, 纯英文论文正文走 Times New Roman, 同一套 style 通吃)
|
||||
|
||||
支持: **加粗** / *斜体* / `等宽`; 列表 / 表格 /  居中插图 + 图题自增;
|
||||
```mermaid``` 块按 caption 查 figures/fig_<caption>.png (由 render_diagrams.py 预生成)。
|
||||
|
||||
用法:
|
||||
python render_docx.py <sections_dir> --lang en -o <out.docx>
|
||||
python render_docx.py <sections_dir> --lang zh --toc -o <out.docx>
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
from docx import Document
|
||||
from docx.enum.text import WD_ALIGN_PARAGRAPH
|
||||
from docx.oxml import OxmlElement
|
||||
from docx.oxml.ns import qn
|
||||
from docx.shared import Cm, Pt, RGBColor
|
||||
|
||||
|
||||
# ───────────────────────── 字体辅助 ─────────────────────────
|
||||
|
||||
def _set_run_fonts(run, *, cn_font: str = "宋体", en_font: str = "Times New Roman") -> None:
|
||||
rPr = run._element.get_or_add_rPr()
|
||||
rFonts = rPr.find(qn("w:rFonts"))
|
||||
if rFonts is None:
|
||||
rFonts = OxmlElement("w:rFonts")
|
||||
rPr.append(rFonts)
|
||||
rFonts.set(qn("w:eastAsia"), cn_font)
|
||||
rFonts.set(qn("w:ascii"), en_font)
|
||||
rFonts.set(qn("w:hAnsi"), en_font)
|
||||
|
||||
|
||||
def _set_style_fonts(style, *, cn_font: str = "宋体", en_font: str = "Times New Roman") -> None:
|
||||
el = style.element
|
||||
rPr = el.find(qn("w:rPr"))
|
||||
if rPr is None:
|
||||
rPr = OxmlElement("w:rPr")
|
||||
el.insert(0, rPr)
|
||||
rFonts = rPr.find(qn("w:rFonts"))
|
||||
if rFonts is None:
|
||||
rFonts = OxmlElement("w:rFonts")
|
||||
rPr.append(rFonts)
|
||||
rFonts.set(qn("w:eastAsia"), cn_font)
|
||||
rFonts.set(qn("w:ascii"), en_font)
|
||||
rFonts.set(qn("w:hAnsi"), en_font)
|
||||
|
||||
|
||||
# ───────────────────────── 文档初始化 ─────────────────────────
|
||||
|
||||
def init_doc() -> Document:
|
||||
doc = Document()
|
||||
|
||||
section = doc.sections[0]
|
||||
section.page_height = Cm(29.7)
|
||||
section.page_width = Cm(21)
|
||||
section.top_margin = Cm(2.5)
|
||||
section.bottom_margin = Cm(2.5)
|
||||
section.left_margin = Cm(2.5)
|
||||
section.right_margin = Cm(2.5)
|
||||
|
||||
normal = doc.styles["Normal"]
|
||||
normal.font.name = "Times New Roman"
|
||||
normal.font.size = Pt(12)
|
||||
_set_style_fonts(normal, cn_font="宋体")
|
||||
pf = normal.paragraph_format
|
||||
pf.line_spacing = 1.5
|
||||
pf.space_before = Pt(0)
|
||||
pf.space_after = Pt(0)
|
||||
|
||||
for lvl, sz, cn in [(1, Pt(14), "黑体"), (2, Pt(12), "黑体"), (3, Pt(12), "宋体")]:
|
||||
h = doc.styles[f"Heading {lvl}"]
|
||||
h.font.name = "Times New Roman"
|
||||
h.font.size = sz
|
||||
h.font.bold = True
|
||||
h.font.color.rgb = RGBColor(0, 0, 0)
|
||||
_set_style_fonts(h, cn_font=cn)
|
||||
h.paragraph_format.line_spacing = 1.5
|
||||
h.paragraph_format.space_before = Pt(6)
|
||||
h.paragraph_format.space_after = Pt(3)
|
||||
h.paragraph_format.first_line_indent = None
|
||||
|
||||
return doc
|
||||
|
||||
|
||||
# ───────────────────────── TOC (opt-in) ─────────────────────────
|
||||
|
||||
def add_toc(doc: Document, depth: int = 3) -> None:
|
||||
p = doc.add_paragraph()
|
||||
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
p.paragraph_format.first_line_indent = None
|
||||
p.paragraph_format.space_before = Pt(12)
|
||||
p.paragraph_format.space_after = Pt(6)
|
||||
run = p.add_run("Contents")
|
||||
run.font.size = Pt(16)
|
||||
run.font.bold = True
|
||||
_set_run_fonts(run, cn_font="黑体")
|
||||
|
||||
p = doc.add_paragraph()
|
||||
p.paragraph_format.first_line_indent = None
|
||||
run = p.add_run()
|
||||
|
||||
fldChar1 = OxmlElement("w:fldChar")
|
||||
fldChar1.set(qn("w:fldCharType"), "begin")
|
||||
instrText = OxmlElement("w:instrText")
|
||||
instrText.set(qn("xml:space"), "preserve")
|
||||
instrText.text = f' TOC \\o "1-{depth}" \\h \\z \\u '
|
||||
fldChar2 = OxmlElement("w:fldChar")
|
||||
fldChar2.set(qn("w:fldCharType"), "separate")
|
||||
fldChar3 = OxmlElement("w:fldChar")
|
||||
fldChar3.set(qn("w:fldCharType"), "end")
|
||||
placeholder_t = OxmlElement("w:t")
|
||||
placeholder_t.set(qn("xml:space"), "preserve")
|
||||
placeholder_t.text = "[Press F9 in Word to generate the table of contents]"
|
||||
run._element.append(fldChar1)
|
||||
run._element.append(instrText)
|
||||
run._element.append(fldChar2)
|
||||
run._element.append(placeholder_t)
|
||||
run._element.append(fldChar3)
|
||||
doc.add_page_break()
|
||||
|
||||
|
||||
# ───────────────────────── 内联 markdown ─────────────────────────
|
||||
|
||||
_INLINE_RE = re.compile(
|
||||
r"(?P<bold>\*\*(?P<bold_t>[^*\n]+?)\*\*)"
|
||||
r"|(?P<italic>(?<![\*\w])\*(?P<italic_t>[^*\n]+?)\*(?!\*))"
|
||||
r"|(?P<code>`(?P<code_t>[^`\n]+?)`)"
|
||||
)
|
||||
|
||||
|
||||
def parse_inline(text: str) -> list[tuple[str, str]]:
|
||||
out: list[tuple[str, str]] = []
|
||||
pos = 0
|
||||
for m in _INLINE_RE.finditer(text):
|
||||
if m.start() > pos:
|
||||
out.append(("plain", text[pos:m.start()]))
|
||||
if m.group("bold"):
|
||||
out.append(("bold", m.group("bold_t")))
|
||||
elif m.group("italic"):
|
||||
out.append(("italic", m.group("italic_t")))
|
||||
elif m.group("code"):
|
||||
out.append(("code", m.group("code_t")))
|
||||
pos = m.end()
|
||||
if pos < len(text):
|
||||
out.append(("plain", text[pos:]))
|
||||
return out or [("plain", text)]
|
||||
|
||||
|
||||
def add_inline(paragraph, text: str, *, size: Pt = Pt(12), cn_font: str = "宋体") -> None:
|
||||
for style, seg in parse_inline(text):
|
||||
run = paragraph.add_run(seg)
|
||||
run.font.size = size
|
||||
if style == "bold":
|
||||
run.bold = True
|
||||
_set_run_fonts(run, cn_font=cn_font, en_font="Times New Roman")
|
||||
elif style == "italic":
|
||||
run.italic = True
|
||||
_set_run_fonts(run, cn_font=cn_font, en_font="Times New Roman")
|
||||
elif style == "code":
|
||||
_set_run_fonts(run, cn_font=cn_font, en_font="Consolas")
|
||||
else:
|
||||
_set_run_fonts(run, cn_font=cn_font, en_font="Times New Roman")
|
||||
|
||||
|
||||
# ───────────────────────── 段落 / 标题 / 列表 ─────────────────────────
|
||||
|
||||
def add_heading(doc: Document, text: str, level: int) -> None:
|
||||
p = doc.add_paragraph(style=f"Heading {level}")
|
||||
p.paragraph_format.first_line_indent = None
|
||||
sizes = {1: Pt(14), 2: Pt(12), 3: Pt(12)}
|
||||
cn = {1: "黑体", 2: "黑体", 3: "宋体"}
|
||||
add_inline(p, text, size=sizes[level], cn_font=cn[level])
|
||||
for run in p.runs:
|
||||
run.bold = True
|
||||
|
||||
|
||||
def add_body_paragraph(doc: Document, text: str, *, indent: bool = True) -> None:
|
||||
p = doc.add_paragraph()
|
||||
pf = p.paragraph_format
|
||||
pf.line_spacing = 1.5
|
||||
if indent:
|
||||
pf.first_line_indent = Pt(24)
|
||||
else:
|
||||
pf.first_line_indent = None
|
||||
p.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
|
||||
add_inline(p, text)
|
||||
|
||||
|
||||
# ───────────────────────── 行类型识别 ─────────────────────────
|
||||
|
||||
_HEADING_RE = re.compile(r"^(#{1,6})\s+(.+)$")
|
||||
_TABLE_LINE_RE = re.compile(r"^\s*\|.*\|\s*$")
|
||||
_BLOCKQUOTE_RE = re.compile(r"^\s*>\s?")
|
||||
_HR_RE = re.compile(r"^\s*-{3,}\s*$|^\s*={3,}\s*$|^\s*_{3,}\s*$")
|
||||
_FENCE_RE = re.compile(r"^\s*(`{3,}|~{3,})\s*(\S*)\s*$")
|
||||
|
||||
_LIST_PATTERNS = [
|
||||
re.compile(r"^\[\d+\]\s"),
|
||||
re.compile(r"^[-*+]\s"),
|
||||
re.compile(r"^\d+[\.、.]\s*"),
|
||||
re.compile(r"^\(\d+\)\s*"),
|
||||
re.compile(r"^(\d+)\s*"),
|
||||
re.compile(r"^[一二三四五六七八九十百千]+[、.\.]"),
|
||||
re.compile(r"^[((][一二三四五六七八九十百千]+[))]"),
|
||||
re.compile(r"^[①②③④⑤⑥⑦⑧⑨⑩⑪⑫⑬⑭⑮]"),
|
||||
]
|
||||
|
||||
|
||||
def is_list_item(line: str) -> bool:
|
||||
return any(p.match(line) for p in _LIST_PATTERNS)
|
||||
|
||||
|
||||
def is_table_line(line: str) -> bool:
|
||||
return bool(_TABLE_LINE_RE.match(line))
|
||||
|
||||
|
||||
def is_heading(line: str) -> bool:
|
||||
return bool(_HEADING_RE.match(line))
|
||||
|
||||
|
||||
def is_blockquote(line: str) -> bool:
|
||||
return bool(_BLOCKQUOTE_RE.match(line))
|
||||
|
||||
|
||||
def is_hr(line: str) -> bool:
|
||||
return bool(_HR_RE.match(line))
|
||||
|
||||
|
||||
# ───────────────────────── 代码块 / ASCII 图 ─────────────────────────
|
||||
|
||||
def add_code_block(doc: Document, lines: list[str], lang: str = "") -> None:
|
||||
for ln in lines:
|
||||
p = doc.add_paragraph()
|
||||
pf = p.paragraph_format
|
||||
pf.first_line_indent = None
|
||||
pf.line_spacing = 1.0
|
||||
pf.space_before = Pt(0)
|
||||
pf.space_after = Pt(0)
|
||||
run = p.add_run(ln if ln else " ")
|
||||
run.font.size = Pt(10.5)
|
||||
_set_run_fonts(run, cn_font="新宋体", en_font="Consolas")
|
||||
for t in run._element.iter(qn("w:t")):
|
||||
t.set(qn("xml:space"), "preserve")
|
||||
|
||||
|
||||
# ───────────────────────── 表格 ─────────────────────────
|
||||
|
||||
def _split_md_row(line: str) -> list[str]:
|
||||
return [c.strip() for c in line.strip().strip("|").split("|")]
|
||||
|
||||
|
||||
def _is_separator_row(cells: list[str]) -> bool:
|
||||
return all(re.match(r"^[-:\s]+$", c) for c in cells if c != "")
|
||||
|
||||
|
||||
def render_table(doc: Document, table_lines: list[str]) -> None:
|
||||
rows: list[list[str]] = []
|
||||
for ln in table_lines:
|
||||
cells = _split_md_row(ln)
|
||||
if not cells or _is_separator_row(cells):
|
||||
continue
|
||||
rows.append(cells)
|
||||
if not rows:
|
||||
return
|
||||
n_cols = max(len(r) for r in rows)
|
||||
for r in rows:
|
||||
while len(r) < n_cols:
|
||||
r.append("")
|
||||
|
||||
table = doc.add_table(rows=len(rows), cols=n_cols)
|
||||
try:
|
||||
table.style = "Light Grid Accent 1"
|
||||
except KeyError:
|
||||
pass
|
||||
|
||||
for ri, row in enumerate(rows):
|
||||
for ci, val in enumerate(row):
|
||||
cell = table.rows[ri].cells[ci]
|
||||
cell.text = ""
|
||||
p = cell.paragraphs[0]
|
||||
p.paragraph_format.first_line_indent = None
|
||||
p.paragraph_format.line_spacing = 1.2
|
||||
add_inline(p, val, size=Pt(10.5), cn_font="宋体")
|
||||
if ri == 0:
|
||||
for run in p.runs:
|
||||
run.bold = True
|
||||
|
||||
|
||||
# ───────────────────────── 图片 + 图题 ─────────────────────────
|
||||
|
||||
_IMAGE_LINE_RE = re.compile(r"^\s*!\[(?P<cap>[^\]]*)\]\((?P<src>[^)\s]+)\)\s*$")
|
||||
_MERMAID_CAPTION_RE = re.compile(r"^\s*%%\s*caption\s*:\s*(.+?)\s*$", re.IGNORECASE)
|
||||
_FILENAME_INVALID_RE = re.compile(r"[^一-鿿A-Za-z0-9]+")
|
||||
_MAX_IMG_WIDTH = Cm(15)
|
||||
|
||||
|
||||
def caption_to_stem(caption: str) -> str:
|
||||
cleaned = _FILENAME_INVALID_RE.sub("_", caption).strip("_")[:40]
|
||||
if not cleaned:
|
||||
return ""
|
||||
return f"fig_{cleaned}"
|
||||
|
||||
|
||||
def extract_mermaid_caption(source: str) -> str | None:
|
||||
for ln in source.splitlines():
|
||||
m = _MERMAID_CAPTION_RE.match(ln)
|
||||
if m:
|
||||
return m.group(1).strip()
|
||||
return None
|
||||
|
||||
|
||||
def _resolve_image_path(src: str, base_dir: Path) -> Path | None:
|
||||
p = Path(src)
|
||||
if not p.is_absolute():
|
||||
p = (base_dir / p).resolve()
|
||||
return p if p.is_file() else None
|
||||
|
||||
|
||||
def add_image(doc: Document, png_path: Path, caption: str | None, ctx: dict) -> None:
|
||||
p = doc.add_paragraph()
|
||||
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
p.paragraph_format.first_line_indent = None
|
||||
p.paragraph_format.space_before = Pt(6)
|
||||
p.paragraph_format.space_after = Pt(3)
|
||||
run = p.add_run()
|
||||
try:
|
||||
run.add_picture(str(png_path), width=_MAX_IMG_WIDTH)
|
||||
except Exception as e:
|
||||
run.add_text(f"[image failed: {png_path.name}: {e}]")
|
||||
return
|
||||
|
||||
ctx["fig_no"] = ctx.get("fig_no", 0) + 1
|
||||
cap_p = doc.add_paragraph()
|
||||
cap_p.alignment = WD_ALIGN_PARAGRAPH.CENTER
|
||||
cap_p.paragraph_format.first_line_indent = None
|
||||
cap_p.paragraph_format.space_before = Pt(0)
|
||||
cap_p.paragraph_format.space_after = Pt(6)
|
||||
label = ctx.get("fig_label", "Fig.")
|
||||
cap_text = f"{label} {ctx['fig_no']} {caption}" if caption else f"{label} {ctx['fig_no']}"
|
||||
cap_run = cap_p.add_run(cap_text)
|
||||
cap_run.font.size = Pt(10.5)
|
||||
cap_run.bold = True
|
||||
_set_run_fonts(cap_run, cn_font="宋体", en_font="Times New Roman")
|
||||
|
||||
|
||||
# ───────────────────────── 主渲染 ─────────────────────────
|
||||
|
||||
def render_md_block(doc: Document, md_text: str, ctx: dict) -> None:
|
||||
lines = md_text.splitlines()
|
||||
i = 0
|
||||
n = len(lines)
|
||||
while i < n:
|
||||
line = lines[i].rstrip()
|
||||
|
||||
if not line.strip():
|
||||
i += 1
|
||||
continue
|
||||
|
||||
if is_hr(line):
|
||||
i += 1
|
||||
continue
|
||||
|
||||
m_img = _IMAGE_LINE_RE.match(line)
|
||||
if m_img:
|
||||
src = m_img.group("src")
|
||||
cap = m_img.group("cap").strip() or None
|
||||
png = _resolve_image_path(src, ctx["sections_dir"])
|
||||
if png is not None:
|
||||
add_image(doc, png, cap, ctx)
|
||||
else:
|
||||
add_body_paragraph(doc, f"[image missing: {src}]", indent=False)
|
||||
i += 1
|
||||
continue
|
||||
|
||||
m_fence = _FENCE_RE.match(line)
|
||||
if m_fence:
|
||||
fence = m_fence.group(1)
|
||||
lang = m_fence.group(2) or ""
|
||||
code: list[str] = []
|
||||
i += 1
|
||||
while i < n:
|
||||
m_close = _FENCE_RE.match(lines[i])
|
||||
if m_close and m_close.group(1)[0] == fence[0] and len(m_close.group(1)) >= len(fence):
|
||||
i += 1
|
||||
break
|
||||
code.append(lines[i])
|
||||
i += 1
|
||||
|
||||
if lang.lower() == "mermaid":
|
||||
source = "\n".join(code)
|
||||
cap = extract_mermaid_caption(source)
|
||||
if cap:
|
||||
stem = caption_to_stem(cap)
|
||||
if stem:
|
||||
png = ctx["figures_dir"] / f"{stem}.png"
|
||||
if png.is_file():
|
||||
add_image(doc, png, cap, ctx)
|
||||
continue
|
||||
add_code_block(doc, code, lang)
|
||||
continue
|
||||
|
||||
if is_table_line(line):
|
||||
block: list[str] = []
|
||||
while i < n and is_table_line(lines[i]):
|
||||
block.append(lines[i])
|
||||
i += 1
|
||||
render_table(doc, block)
|
||||
continue
|
||||
|
||||
m = _HEADING_RE.match(line)
|
||||
if m:
|
||||
level = min(len(m.group(1)), 3)
|
||||
add_heading(doc, m.group(2).strip(), level)
|
||||
i += 1
|
||||
continue
|
||||
|
||||
if is_blockquote(line):
|
||||
i += 1
|
||||
continue
|
||||
|
||||
if is_list_item(line):
|
||||
add_body_paragraph(doc, line.strip(), indent=False)
|
||||
i += 1
|
||||
continue
|
||||
|
||||
buf = [line.strip()]
|
||||
j = i + 1
|
||||
while j < n:
|
||||
nxt = lines[j].rstrip()
|
||||
if not nxt.strip():
|
||||
break
|
||||
if is_heading(nxt) or is_blockquote(nxt) or is_table_line(nxt) or is_list_item(nxt) or is_hr(nxt):
|
||||
break
|
||||
buf.append(nxt.strip())
|
||||
j += 1
|
||||
add_body_paragraph(doc, " ".join(buf), indent=True)
|
||||
i = j
|
||||
|
||||
|
||||
# ───────────────────────── 入口 ─────────────────────────
|
||||
|
||||
def render_sections(sections_dir: Path, out: Path, lang: str, toc: bool) -> None:
|
||||
if not sections_dir.is_dir():
|
||||
print(f"[ERR] sections dir not found: {sections_dir}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
md_files = sorted(sections_dir.glob("*.md"))
|
||||
if not md_files:
|
||||
print(f"[ERR] no .md found in {sections_dir}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
figures_dir = sections_dir.parent / "figures"
|
||||
ctx: dict = {
|
||||
"sections_dir": sections_dir,
|
||||
"figures_dir": figures_dir,
|
||||
"fig_no": 0,
|
||||
"fig_label": "图" if lang == "zh" else "Fig.",
|
||||
}
|
||||
|
||||
doc = init_doc()
|
||||
if toc:
|
||||
add_toc(doc)
|
||||
for idx, f in enumerate(md_files):
|
||||
text = f.read_text(encoding="utf-8")
|
||||
render_md_block(doc, text, ctx)
|
||||
if idx != len(md_files) - 1:
|
||||
doc.add_page_break()
|
||||
|
||||
out.parent.mkdir(parents=True, exist_ok=True)
|
||||
doc.save(str(out))
|
||||
|
||||
paras = sum(1 for _ in doc.paragraphs)
|
||||
chars = sum(len(p.text) for p in doc.paragraphs)
|
||||
tbls = len(doc.tables)
|
||||
print(f"[OK] rendered {len(md_files)} sections -> {out}")
|
||||
print(f" paragraphs: {paras} | tables: {tbls} | figures: {ctx['fig_no']} | total chars: {chars}")
|
||||
print(f" lang: {lang} | toc: {toc}")
|
||||
print(f" font: 中文宋体小四 / 英文 Times New Roman 小四 / 行距 1.5 / 首行缩进 2 字符")
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description="渲染章节 md → 论文投稿稿 docx")
|
||||
ap.add_argument("sections_dir", type=Path, help="sections/*.md 目录")
|
||||
ap.add_argument("--lang", choices=["zh", "en"], default="en",
|
||||
help="论文语言 (影响图题前缀 图/Fig. 与信息打印); 默认 en")
|
||||
ap.add_argument("--toc", action="store_true",
|
||||
help="生成目录页 (期刊投稿稿通常不需要; 内部草稿评阅时可加)")
|
||||
ap.add_argument("-o", "--output", type=Path, required=True, help="输出 .docx 路径")
|
||||
args = ap.parse_args()
|
||||
render_sections(args.sections_dir, args.output, args.lang, args.toc)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -0,0 +1,122 @@
|
|||
"""核算各章节篇幅, 对照论文类型 + 语言的预算, 输出表格。
|
||||
|
||||
计量: CJK 字符按 1 字; 连续 ASCII 串 (英文单词 / 数字) 按 1 计 —— 对中文稿近似"字数",
|
||||
对英文稿近似"词数"。预算按 (paper_type, lang) 取, 两套不同口径。
|
||||
|
||||
用法:
|
||||
python word_count.py <sections_dir> --type original --lang en
|
||||
python word_count.py <sections_dir> --type original --lang zh
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import argparse
|
||||
import re
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
# BUDGETS[type][section] = {"en": (lo, hi), "zh": (lo, hi), "desc": str}
|
||||
# en 口径=词数, zh 口径=字数。综述的 thematic 主体章节不设固定预算 (篇数可变)。
|
||||
BUDGETS: dict[str, dict[str, dict]] = {
|
||||
"original": {
|
||||
"00_title_abstract": {"en": (150, 320), "zh": (200, 400), "desc": "Title + Abstract + Keywords"},
|
||||
"01_introduction": {"en": (600, 1000), "zh": (1000, 1800), "desc": "Introduction"},
|
||||
"02_methods": {"en": (800, 1600), "zh": (1200, 2400), "desc": "Materials & Methods"},
|
||||
"03_results": {"en": (1000, 1900), "zh": (1500, 2900), "desc": "Results"},
|
||||
"04_discussion": {"en": (800, 1600), "zh": (1200, 2400), "desc": "Discussion"},
|
||||
"05_conclusion": {"en": (180, 400), "zh": (300, 600), "desc": "Conclusion"},
|
||||
},
|
||||
"review": {
|
||||
"00_title_abstract": {"en": (180, 350), "zh": (250, 450), "desc": "Title + Abstract + Keywords"},
|
||||
"01_introduction": {"en": (700, 1300), "zh": (1200, 2200), "desc": "Introduction"},
|
||||
# 02_..NN_ thematic 主体不设预算 (篇数可变, word_count 标 no budget)
|
||||
"98_outlook": {"en": (500, 1100), "zh": (800, 1800), "desc": "Challenges & Outlook"},
|
||||
"99_conclusion": {"en": (180, 450), "zh": (300, 700), "desc": "Conclusion"},
|
||||
},
|
||||
"letter": {
|
||||
"00_title_abstract": {"en": (120, 250), "zh": (180, 350), "desc": "Title + Abstract"},
|
||||
"01_main": {"en": (1500, 3000), "zh": (2000, 4000), "desc": "Main text (condensed)"},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
_HEADING_RE = re.compile(r"^#+\s+")
|
||||
_BLOCKQUOTE_RE = re.compile(r"^>")
|
||||
_TABLE_LINE_RE = re.compile(r"^\s*\|")
|
||||
|
||||
|
||||
def count_chars(text: str) -> int:
|
||||
n = 0
|
||||
for line in text.splitlines():
|
||||
stripped = line.strip()
|
||||
if not stripped:
|
||||
continue
|
||||
if _HEADING_RE.match(stripped) or _BLOCKQUOTE_RE.match(stripped) or _TABLE_LINE_RE.match(stripped):
|
||||
continue
|
||||
if stripped.startswith("<TODO") and stripped.endswith(">"):
|
||||
continue
|
||||
for c in stripped:
|
||||
if "一" <= c <= "鿿":
|
||||
n += 1
|
||||
n += len(re.findall(r"[A-Za-z0-9]+", stripped))
|
||||
return n
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser(description="论文章节篇幅核算")
|
||||
ap.add_argument("sections_dir", type=Path)
|
||||
ap.add_argument("--type", required=True, choices=list(BUDGETS.keys()))
|
||||
ap.add_argument("--lang", required=True, choices=["zh", "en"])
|
||||
args = ap.parse_args()
|
||||
|
||||
if not args.sections_dir.is_dir():
|
||||
print(f"[ERR] {args.sections_dir} not a directory", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
budget = BUDGETS[args.type]
|
||||
files = sorted(args.sections_dir.glob("*.md"))
|
||||
if not files:
|
||||
print(f"[ERR] no .md found in {args.sections_dir}", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
unit = "词" if args.lang == "en" else "字"
|
||||
print(f"\n[篇幅核算] type={args.type} lang={args.lang} (口径: {unit})\n")
|
||||
header = f"{'章节':<26} {'篇幅':>8} {'下限':>6} {'上限':>6} 状态"
|
||||
print(header)
|
||||
print("-" * len(header))
|
||||
|
||||
total = 0
|
||||
overflow = 0
|
||||
underflow = 0
|
||||
for f in files:
|
||||
text = f.read_text(encoding="utf-8")
|
||||
n = count_chars(text)
|
||||
total += n
|
||||
stem = f.stem
|
||||
bud = None
|
||||
for key, val in budget.items():
|
||||
if stem.startswith(key):
|
||||
bud = val
|
||||
break
|
||||
if bud is None:
|
||||
print(f"{stem:<26} {n:>8} - - (no budget; thematic/aux section)")
|
||||
continue
|
||||
lo, hi = bud[args.lang]
|
||||
status = "OK"
|
||||
if n > hi:
|
||||
status = f"WARN over {n - hi}"
|
||||
overflow += 1
|
||||
elif n < lo:
|
||||
status = f"WARN under {lo - n}"
|
||||
underflow += 1
|
||||
print(f"{stem:<26} {n:>8} {lo:>6} {hi:>6} {status}")
|
||||
|
||||
print("-" * len(header))
|
||||
print(f"{'合计':<26} {total:>8}")
|
||||
if overflow or underflow:
|
||||
print(f"\n[WARN] {overflow} 项超出 / {underflow} 项不足 (含摘要/正文)。回头调整。")
|
||||
sys.exit(1)
|
||||
print("\n[OK] 全部章节篇幅合规。")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -0,0 +1,112 @@
|
|||
# 原创研究论文章节骨架 (type=original)
|
||||
|
||||
> 阶段四起草时,把每一节复制到 `<task_dir>/sections/NN_xxx.md`(文件名见 references/paper_types.md)。
|
||||
> `> 引言块` 是写作提示,**不进正稿**(render_docx 会跳过)。正文语言随 spec(zh/en)。
|
||||
> 标题用对应语言(下面给中英双标,据 spec 取一套)。
|
||||
|
||||
---
|
||||
|
||||
## 00_title_abstract
|
||||
|
||||
# 题名 / Title
|
||||
> 结论式,含 体系 + 做法 + 关键发现;≤25 字(中)/ 简洁(英)。不用"研究/Study on"凑字。
|
||||
|
||||
`<TODO 题名>`
|
||||
|
||||
**作者与单位 / Authors & Affiliations**:`<TODO>`(通讯作者标 *)
|
||||
|
||||
# 摘要 / Abstract
|
||||
> 自含,150-320 词 / 200-400 字。四要素:背景目的 → 方法 → **定量**结果 → 结论。**不出现 [n] 引文与图表号**。
|
||||
|
||||
`<TODO 摘要正文>`
|
||||
|
||||
# 关键词 / Keywords
|
||||
`<TODO 关键词1>; <TODO 关键词2>; <TODO 关键词3>`(3-6 个,与题名互补)
|
||||
|
||||
---
|
||||
|
||||
## 01_introduction
|
||||
|
||||
# 引言 / Introduction
|
||||
> 漏斗式三段:① 领域背景与重要性(为什么这类材料/性能值得做);② 已有工作与 **gap**(别人做到哪、缺什么,带引文);③ 本文目的与贡献(末段点明,呼应 spec §3/§4)。每段一个论点,不堆文献流水账。
|
||||
|
||||
`<TODO 第 1 段:领域背景>` [CITE-xx]
|
||||
|
||||
`<TODO 第 2 段:已有研究与不足>` [CITE-xx]
|
||||
|
||||
`<TODO 末段:本文针对该 gap,做了……,贡献是……>`
|
||||
|
||||
---
|
||||
|
||||
## 02_methods
|
||||
|
||||
# 材料与方法 / Materials and Methods
|
||||
> **可复现是铁律**。别写"按常规方法"。
|
||||
|
||||
## 原材料 / Materials
|
||||
> 来源 / 纯度 / 化学组成(给 XRF 或厂家数据表)/ 粒度等。
|
||||
|
||||
`<TODO>`
|
||||
|
||||
## 配合比与制备 / Mix design and preparation
|
||||
> 配比(具体数值 + 单位)、成型、养护(温度/湿度/龄期)、升温降温制度/气氛(如涉及烧成)。
|
||||
|
||||
`<TODO>`
|
||||
|
||||
## 表征方法 / Characterization
|
||||
> 每种测试:仪器型号、关键参数、依据标准(GB/T、ASTM、EN、ISO)。
|
||||
|
||||
`<TODO XRD / SEM / TG-DSC / 抗压强度 / ……>`
|
||||
|
||||
---
|
||||
|
||||
## 03_results
|
||||
|
||||
# 结果 / Results
|
||||
> **只陈述不解读**。每个图表在正文被引用 + 描述趋势;数据带单位 + 误差。机理解释留给 Discussion。
|
||||
|
||||
## <结果主题 1,如 力学性能>
|
||||
`<TODO 描述 Fig.1 趋势>`
|
||||
|
||||

|
||||
|
||||
## <结果主题 2,如 物相与微观结构>
|
||||
`<TODO 描述 Fig.2 / Table 1>`
|
||||
|
||||
---
|
||||
|
||||
## 04_discussion
|
||||
|
||||
# 讨论 / Discussion
|
||||
> 解读机理 / 与文献对比(带引文)/ 局限。回答 Introduction 提的问题。**不重复 Results 的数字**。
|
||||
|
||||
`<TODO 机理解释>` [CITE-xx]
|
||||
|
||||
`<TODO 与已有研究对比>` [CITE-xx]
|
||||
|
||||
`<TODO 局限与适用边界>`
|
||||
|
||||
---
|
||||
|
||||
## 05_conclusion
|
||||
|
||||
# 结论 / Conclusion
|
||||
> 3-5 条,定量,不引入新数据;展望可选。
|
||||
|
||||
1. `<TODO 结论 1 + 关键数据>`
|
||||
2. `<TODO 结论 2>`
|
||||
3. `<TODO 结论 3>`
|
||||
|
||||
---
|
||||
|
||||
## 06_references
|
||||
|
||||
# 参考文献 / References
|
||||
> 走 citation_verify.md 核验后,按文中首次出现顺序编号。格式见 cite_gbt7714.md(zh)或 cite_elsevier.md(en)。
|
||||
|
||||
```
|
||||
[1] <TODO 核验后的真实条目>
|
||||
[2] <TODO>
|
||||
```
|
||||
|
||||
> 可选附加:Graphical Abstract / Highlights / Acknowledgments / CRediT / Declaration of competing interest / Data availability —— 按目标期刊要求增删。
|
||||
|
|
@ -0,0 +1,77 @@
|
|||
# 综述论文章节骨架 (type=review)
|
||||
|
||||
> 主题式组织(thematic),不是 IMRaD。每节复制到 `<task_dir>/sections/NN_xxx.md`。
|
||||
> 主题章用 `02_`~`NN_` 两位前缀排序;outlook/conclusion 用 `98_/99_` 保证排在主题之后。
|
||||
> `> 块` 是写作提示,不进正稿。正文语言随 spec。
|
||||
|
||||
---
|
||||
|
||||
## 00_title_abstract
|
||||
|
||||
# 题名 / Title
|
||||
> 点明综述范围与视角(别只写"……研究进展",加上独特切入点)。
|
||||
|
||||
`<TODO 题名>`
|
||||
|
||||
**作者与单位**:`<TODO>`
|
||||
|
||||
# 摘要 / Abstract
|
||||
> 180-350 词 / 250-450 字。说清:综述什么、为什么现在综述、本文的组织视角、主要结论性判断。不出现 [n]。
|
||||
|
||||
`<TODO>`
|
||||
|
||||
# 关键词 / Keywords
|
||||
`<TODO>`(3-6 个)
|
||||
|
||||
---
|
||||
|
||||
## 01_introduction
|
||||
|
||||
# 引言 / Introduction
|
||||
> ① 领域意义与为何需要这篇综述;② 已有综述的不足 / 本文独特视角;③ 综述范围界定 + 本文组织结构(按什么维度切主题)。
|
||||
|
||||
`<TODO>` [CITE-xx]
|
||||
|
||||
---
|
||||
|
||||
## 02_<theme> (主题章,按需复制多份:02_/03_/04_…)
|
||||
|
||||
# <主题名,如 水化机理 / Hydration mechanisms>
|
||||
> 一章一个主题维度(可按 材料体系 / 性能维度 / 方法路线 切)。综述≠罗列文献,要有**横向归纳与批判**:谁做了什么、结论是否一致、矛盾在哪、本文的判断。
|
||||
|
||||
`<TODO 归纳性论述,带引文>` [CITE-xx]
|
||||
|
||||
> 需要对比多项研究时用表格:
|
||||
|
||||
| 体系 / 研究 | 方法 | 关键结论 | 局限 |
|
||||
|---|---|---|---|
|
||||
| `<TODO>` [CITE-xx] | `<TODO>` | `<TODO>` | `<TODO>` |
|
||||
|
||||
---
|
||||
|
||||
## 98_outlook
|
||||
|
||||
# 挑战与展望 / Challenges and Outlook
|
||||
> 未解难题、方法瓶颈、产业化障碍、未来趋势。要具体到方向,不空泛喊口号。
|
||||
|
||||
`<TODO>`
|
||||
|
||||
---
|
||||
|
||||
## 99_conclusion
|
||||
|
||||
# 结论 / Conclusion
|
||||
> 凝练全文的主要判断(不是逐章复述),给读者带走的核心结论。
|
||||
|
||||
`<TODO>`
|
||||
|
||||
---
|
||||
|
||||
## <NN>_references
|
||||
|
||||
# 参考文献 / References
|
||||
> 综述引文常 80-200 条。**全部走 citation_verify.md 核验** —— 综述引文量大,编造/引错风险更高,逐条核。格式见 cite_*.md。
|
||||
|
||||
```
|
||||
[1] <TODO 核验后的真实条目>
|
||||
```
|
||||
|
|
@ -0,0 +1,67 @@
|
|||
# 论文 spec(投稿稿"宪法")
|
||||
|
||||
> 阶段一产物。**写定后不再改**,阶段二/四每章前都要 read。`<TODO>` 是占位符,需用户明确填值,不要硬编。
|
||||
> 本 spec 的「论文类型 + 语言」决定后续加载哪套 references(paper_types / cite_* / redlines_*)。
|
||||
|
||||
## 1. 论文类型 与 语言
|
||||
|
||||
- 类型:`<TODO original / review / letter>`(决定章节骨架,见 references/paper_types.md)
|
||||
- 语言:`<TODO zh / en>`(zh→cite_gbt7714 + redlines_zh;en→cite_elsevier + redlines_en)
|
||||
|
||||
## 2. 目标期刊
|
||||
|
||||
- 期刊:`<TODO 期刊名,如 Cement and Concrete Research / 硅酸盐学报>`
|
||||
- 体裁要求:`<TODO 字数上限 / Highlights 是否必填 / 结构化摘要 / 引文格式特殊要求 —— 查该刊 Guide for Authors>`
|
||||
- 若未定期刊:按语言走默认引文格式,体裁走 paper_types 通用骨架
|
||||
|
||||
## 3. 一句话贡献 (one-sentence contribution)
|
||||
|
||||
> 全文的"宪法第一条"。审稿人记不住别的,要记住这一句。
|
||||
|
||||
`<TODO 用一句话说清:本文做了什么、发现了什么、为什么重要。如"通过 X 调控 Y 体系的 Z,首次定量揭示了 A 与 B 的关系,使 C 性能提升 D">`
|
||||
|
||||
## 4. 创新点 / 贡献清单
|
||||
|
||||
(2-4 条,每条一句,后面 Introduction 末段与 Conclusion 要呼应)
|
||||
|
||||
- 贡献 1:`<TODO>`
|
||||
- 贡献 2:`<TODO>`
|
||||
- 贡献 3:`<TODO>`
|
||||
|
||||
## 5. 材料体系与实验设计(original / letter 必填)
|
||||
|
||||
- 研究对象 / 材料体系:`<TODO 如 碱激发矿渣-粉煤灰胶凝材料>`
|
||||
- 关键自变量(调控因素):`<TODO 如 Na₂O 当量 8/10/12 wt%;养护温度 20/40/60 °C>`
|
||||
- 关键因变量(性能/表征指标):`<TODO 如 28d 抗压强度;XRD 物相;SEM 微观结构;TG 结合水>`
|
||||
- 核心数据来源:`<TODO 用户实验数据 / 引用的预实验 —— 不得自造>`
|
||||
|
||||
## 6. 图表清单(阶段三"先定图表"用)
|
||||
|
||||
> 论文的证据骨架。写正文前先把这些图表定下来,避免正文与图对不上。
|
||||
|
||||
| 编号 | 类型 | 内容 / 要表达的结论 | 数据来源 | 出图工具 |
|
||||
|---|---|---|---|---|
|
||||
| Fig.1 | `<TODO>` | `<TODO 这张图证明什么>` | `<TODO>` | plot_pub / mermaid / 实拍 |
|
||||
| Fig.2 | `<TODO>` | `<TODO>` | `<TODO>` | `<TODO>` |
|
||||
| Table 1 | `<TODO>` | `<TODO>` | `<TODO>` | — |
|
||||
|
||||
## 7. 篇幅预算
|
||||
|
||||
- 类型对应预算见 references/paper_types.md(en 词 / zh 字)
|
||||
- 目标期刊有硬上限的,以期刊为准:`<TODO 字/词上限>`
|
||||
|
||||
## 8. TODO / 待用户提供
|
||||
|
||||
> 阶段一冒出来的"等用户提供"事项。阶段四每章开头扫一眼;阶段六 quality_check 扫余留 `<TODO>`。
|
||||
|
||||
- [ ] `<TODO 如:提供 28d 强度实测数据表>`
|
||||
- [ ] `<TODO 如:确认目标期刊与引文格式>`
|
||||
- [ ] `<TODO 如:提供基金号 / 利益冲突声明>`
|
||||
|
||||
## 9. 引文清单 / 来源(经核验后填)
|
||||
|
||||
> 编造文献是学术不端。引文走 citation_verify.md 三层核验后才进此清单;起草时正文用 `[CITE-xx]` 占位。
|
||||
|
||||
```
|
||||
[CITE-xx] -> <TODO 核验后的真实条目>
|
||||
```
|
||||
Loading…
Reference in New Issue