fix(resm): 静音 pypdf 解析坏 PDF 时的恢复日志

_pdf_page_count 读到损坏 PDF 时 pypdf 会刷大量 incorrect header / Cannot find
/Root 等恢复日志, 污染 fix_preview_pdf 等批处理输出。将 pypdf logger 调到
CRITICAL 静音; 解析失败仍按 None 处理(跳过该条)。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
caoqianming 2026-06-29 09:14:23 +08:00
parent e695e04de7
commit 97b23a2b06
1 changed files with 4 additions and 0 deletions

View File

@ -607,6 +607,10 @@ def _pdf_page_count(content: bytes):
(对未压缩对象树有效, Elsevier 的摘要预览页正属此类)""" (对未压缩对象树有效, Elsevier 的摘要预览页正属此类)"""
try: try:
from io import BytesIO from io import BytesIO
import logging
# 坏 PDF 会让 pypdf 刷大量恢复日志(incorrect header / Cannot find /Root 等),
# 这里只关心页数, 静音其 logger 避免污染输出。
logging.getLogger("pypdf").setLevel(logging.CRITICAL)
from pypdf import PdfReader from pypdf import PdfReader
return len(PdfReader(BytesIO(content), strict=False).pages) return len(PdfReader(BytesIO(content), strict=False).pages)
except ImportError: except ImportError: