fix(resm): 静音 pypdf 解析坏 PDF 时的恢复日志
_pdf_page_count 读到损坏 PDF 时 pypdf 会刷大量 incorrect header / Cannot find /Root 等恢复日志, 污染 fix_preview_pdf 等批处理输出。将 pypdf logger 调到 CRITICAL 静音; 解析失败仍按 None 处理(跳过该条)。 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
e695e04de7
commit
97b23a2b06
|
|
@ -607,6 +607,10 @@ def _pdf_page_count(content: bytes):
|
|||
(对未压缩对象树有效, Elsevier 的摘要预览页正属此类)。"""
|
||||
try:
|
||||
from io import BytesIO
|
||||
import logging
|
||||
# 坏 PDF 会让 pypdf 刷大量恢复日志(incorrect header / Cannot find /Root 等),
|
||||
# 这里只关心页数, 静音其 logger 避免污染输出。
|
||||
logging.getLogger("pypdf").setLevel(logging.CRITICAL)
|
||||
from pypdf import PdfReader
|
||||
return len(PdfReader(BytesIO(content), strict=False).pages)
|
||||
except ImportError:
|
||||
|
|
|
|||
Loading…
Reference in New Issue