paper_server/apps/resm
caoqianming e695e04de7 fix(resm): 识别 Elsevier 摘要预览 PDF, 避免误标全文
Elsevier Article API 对未授权/in-press 文章, application/pdf 端点会返回仅含
摘要的 1 页预览 PDF (魔数仍是 %PDF、体积也不小), 而全文 XML 可正常获取。旧逻辑
只校验魔数+体积, 误将预览页落库并置 has_fulltext_pdf=True。

- tasks.py: 新增 _pdf_page_count / _is_elsevier_preview_pdf (优先 pypdf, 退化
  字节扫描), _elsevier_fetch_pdf 与 save_pdf_from_elsevier 落库前排除 1 页预览页,
  打 fail_reason=elsevier_pdf_preview_only; 补抓队列 qs_pdf 排除该标记避免无限重试
- 新增管理命令 fix_preview_pdf: 扫描存量误标记录, 回退 has_fulltext_pdf;
  无 XML 全文者一并回退 has_fulltext, 让其重进下载链
- requirements.txt: 增加 pypdf>=4.0.0

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-29 08:54:07 +08:00
..
management fix(resm): 识别 Elsevier 摘要预览 PDF, 避免误标全文 2026-06-29 08:54:07 +08:00
migrations feat(resm): 合并 elsevier 抓取任务 + 抓取链常驻保活 + openalex 限流退避 2026-06-23 10:06:49 +08:00
__init__.py feat: 添加resm app 2026-01-23 10:37:41 +08:00
admin.py feat(resm): 期刊/关键词监控 PaperMonitor + 移除每日增量周期任务 2026-06-21 23:43:58 +08:00
apps.py feat: 添加resm app 2026-01-23 10:37:41 +08:00
cloudflare_checkbox2.png feat: 添加pyautogui调用 2026-02-09 15:17:02 +08:00
d_oaurl.py feat:通过cloudflare 验证 2026-03-23 16:30:18 +08:00
d_scihub.py feat:通过cloudflare 验证 2026-03-23 16:30:18 +08:00
filters.py feat(resm): paper 查询加 publication_date 精确 + 范围过滤 2026-06-22 11:12:00 +08:00
models.py feat(resm): 期刊/关键词监控 PaperMonitor + 移除每日增量周期任务 2026-06-21 23:43:58 +08:00
serializers.py feat: paper list 加 pdf_url / xml_url 直链字段 + pg_trgm GIN 索引 2026-05-21 13:48:52 +08:00
services.py feat: 增加download_pdf 2026-01-28 15:01:49 +08:00
tasks.py fix(resm): 识别 Elsevier 摘要预览 PDF, 避免误标全文 2026-06-29 08:54:07 +08:00
tests.py feat: 添加resm app 2026-01-23 10:37:41 +08:00
urls.py feat: 修改pdf 验证cloudflare 2026-03-24 10:34:06 +08:00
views.py feat: paper list 返 abstract + 加 retrieve 端点 + filterset 扩 year range / 多字段 2026-05-21 13:17:46 +08:00