perf(context): 压缩旧 assistant tool_call arguments + keep_recent 20→12

旧 assistant `tool_calls[].function.arguments` 超 ~800 chars 时压成合法 JSON
标记(保留 path/script_path/name/original_chars),避免 `write(content=...)`
源码参数反复进 prompt;keep_recent 20→12 收窄原文窗口。role/tool_call_id/
name 等协议字段不变,tool_call 协议完整。stats 增 compacted_tool_call_arguments。
DESIGN §8.2 退出标准补一条:列表 N 条/N tok 是历史累计、不随发送前压缩下降。

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
caoqianming 2026-06-05 08:25:56 +08:00
parent 1c30a9e54e
commit 5f8b157733
3 changed files with 121 additions and 3 deletions

View File

@ -587,7 +587,7 @@ zcbot-sandbox image 已 ~1.5G(python deps + chromium + nodejs + mermaid-cli),后
**Stage 3:上下文预算与自动压缩(中风险,需测试)**: **Stage 3:上下文预算与自动压缩(中风险,需测试)**:
1. 新增 `core/context.py` 负责构造 LLM messages,输入为 `Session.messages` + budget,输出为裁剪后的 messages。 1. 新增 `core/context.py` 负责构造 LLM messages,输入为 `Session.messages` + budget,输出为裁剪后的 messages。
2. 第一步只做**旧 tool 消息压缩**:保留 system、最近若干条原文,对较旧且过长的 `role=tool` 内容做头尾摘要;旧 `load_skill` 结果压成"已加载 skill: name/dir"标记;`role` / `tool_call_id` / `name` 不变,保证 OpenAI/LiteLLM tool_call 协议完整。这个阶段不调用额外 LLM,不生成全局摘要。 2. 第一步只做**旧 tool / tool_call 参数压缩**:保留 system、最近约 12 条原文,对较旧且过长的 `role=tool` 内容做头尾摘要;旧 `load_skill` 结果压成"已加载 skill: name/dir"标记;旧 assistant `tool_calls[].function.arguments` 超过约 800 chars 时压成合法 JSON 标记(保留 path/script_path/name/original_chars),避免 `write(content=...)` 源码参数反复进 prompt。`role` / `tool_call_id` / `name` 不变,保证 OpenAI/LiteLLM tool_call 协议完整。这个阶段不调用额外 LLM,不生成全局摘要。
3. 第二步再做 task summary:保留 system、最近 6-10 轮原文、未闭合 tool_call 协议相关消息、最新用户消息;旧消息压成一条 summary。 3. 第二步再做 task summary:保留 system、最近 6-10 轮原文、未闭合 tool_call 协议相关消息、最新用户消息;旧消息压成一条 summary。
4. summary 必须区分:用户确认的硬约束、当前计划、已生成文件路径、关键事实、待办/风险、可丢弃日志。旧 tool 原文不直接塞回,只保留路径和摘要。 4. summary 必须区分:用户确认的硬约束、当前计划、已生成文件路径、关键事实、待办/风险、可丢弃日志。旧 tool 原文不直接塞回,只保留路径和摘要。
5. 阈值建议先按字符粗估触发(如 200k chars),后续接 tokenizer 精确预算;触发后目标压到 reliable_context 的 25%-40%,避免刚压完又涨满。 5. 阈值建议先按字符粗估触发(如 200k chars),后续接 tokenizer 精确预算;触发后目标压到 reliable_context 的 25%-40%,避免刚压完又涨满。
@ -596,6 +596,7 @@ zcbot-sandbox image 已 ~1.5G(python deps + chromium + nodejs + mermaid-cli),后
- 高用量 task 的单次 `tokens_in` 从 50 万级降到 5-10 万级以内,常规任务低于 3 万。 - 高用量 task 的单次 `tokens_in` 从 50 万级降到 5-10 万级以内,常规任务低于 3 万。
- `usage_events.units` 能区分 input/output/cache hit/cache miss,`cost_cny` 不再全 0。 - `usage_events.units` 能区分 input/output/cache hit/cache miss,`cost_cny` 不再全 0。
- `llm_start` SSE 事件能看到 `context_original_chars` / `context_sent_chars` / `context_saved_chars` / `context_compacted_tool_messages` / `context_compacted_skill_messages`,dev SPA 底部 hint 同步展示,用于判断压缩是否真实生效。 - `llm_start` SSE 事件能看到 `context_original_chars` / `context_sent_chars` / `context_saved_chars` / `context_compacted_tool_messages` / `context_compacted_skill_messages`,dev SPA 底部 hint 同步展示,用于判断压缩是否真实生效。
- task 列表里的 `N 条 / N tok` 是 DB 持久化累计消息数和历史调用总 token,用于账单 / 审计;它不会因"发送前上下文压缩"下降。真实本轮发送体量看 `llm_start context_*``llm_end prompt_tokens/cache_*`
- 文献采集 / 论文写作 / PPT 三类长任务仍能复查原文路径,不会因摘要丢失用户确认过的规格。 - 文献采集 / 论文写作 / PPT 三类长任务仍能复查原文路径,不会因摘要丢失用户确认过的规格。
- 增加 focused tests 覆盖 usage detail 提取、成本兜底、工具结果裁剪、上下文压缩协议完整性。 - 增加 focused tests 覆盖 usage detail 提取、成本兜底、工具结果裁剪、上下文压缩协议完整性。

View File

@ -45,11 +45,63 @@ def _message_chars(msg: dict[str, Any]) -> int:
return len(str(msg)) return len(str(msg))
def _compact_tool_call_arguments(raw: Any, max_chars: int) -> tuple[Any, bool]:
if not isinstance(raw, str) or len(raw) <= max_chars:
return raw, False
marker: dict[str, Any] = {
"_compacted": True,
"original_chars": len(raw),
"note": "old assistant tool_call arguments omitted from context",
}
try:
parsed = json.loads(raw)
except Exception:
parsed = None
if isinstance(parsed, dict):
for key in ("path", "script_path", "file_path", "name"):
value = parsed.get(key)
if isinstance(value, str) and value:
marker[key] = value
content = parsed.get("content")
if isinstance(content, str):
marker["content_chars"] = len(content)
return json.dumps(marker, ensure_ascii=False), True
def _compact_assistant_tool_calls(
msg: dict[str, Any],
*,
max_arg_chars: int,
) -> tuple[int, int]:
tool_calls = msg.get("tool_calls")
if not isinstance(tool_calls, list):
return 0, 0
compacted = 0
saved = 0
for tc in tool_calls:
if not isinstance(tc, dict):
continue
fn = tc.get("function")
if not isinstance(fn, dict):
continue
before = fn.get("arguments")
after, did_compact = _compact_tool_call_arguments(
before,
max_chars=max(0, max_arg_chars),
)
if did_compact:
fn["arguments"] = after
compacted += 1
saved += len(before) - len(after)
return compacted, max(0, saved)
def prepare_messages_for_llm( def prepare_messages_for_llm(
messages: List[dict[str, Any]], messages: List[dict[str, Any]],
*, *,
keep_recent: int = 20, keep_recent: int = 12,
old_tool_chars: int = 2_000, old_tool_chars: int = 2_000,
old_tool_arg_chars: int = 800,
) -> List[dict[str, Any]]: ) -> List[dict[str, Any]]:
"""返回发给 LLM 的 messages 副本。 """返回发给 LLM 的 messages 副本。
@ -61,6 +113,7 @@ def prepare_messages_for_llm(
messages, messages,
keep_recent=keep_recent, keep_recent=keep_recent,
old_tool_chars=old_tool_chars, old_tool_chars=old_tool_chars,
old_tool_arg_chars=old_tool_arg_chars,
) )
return prepared return prepared
@ -68,8 +121,9 @@ def prepare_messages_for_llm(
def prepare_messages_with_stats( def prepare_messages_with_stats(
messages: List[dict[str, Any]], messages: List[dict[str, Any]],
*, *,
keep_recent: int = 20, keep_recent: int = 12,
old_tool_chars: int = 2_000, old_tool_chars: int = 2_000,
old_tool_arg_chars: int = 800,
) -> tuple[List[dict[str, Any]], dict[str, int]]: ) -> tuple[List[dict[str, Any]], dict[str, int]]:
"""返回发给 LLM 的 messages 副本和压缩统计。""" """返回发给 LLM 的 messages 副本和压缩统计。"""
if keep_recent < 0: if keep_recent < 0:
@ -79,9 +133,16 @@ def prepare_messages_with_stats(
prepared: List[dict[str, Any]] = [] prepared: List[dict[str, Any]] = []
compacted_tool_messages = 0 compacted_tool_messages = 0
compacted_skill_messages = 0 compacted_skill_messages = 0
compacted_tool_call_arguments = 0
for idx, msg in enumerate(messages): for idx, msg in enumerate(messages):
new_msg = deepcopy(msg) new_msg = deepcopy(msg)
is_recent = idx >= recent_start is_recent = idx >= recent_start
if not is_recent and new_msg.get("role") == "assistant":
n_args, _ = _compact_assistant_tool_calls(
new_msg,
max_arg_chars=old_tool_arg_chars,
)
compacted_tool_call_arguments += n_args
if ( if (
not is_recent not is_recent
and new_msg.get("role") == "tool" and new_msg.get("role") == "tool"
@ -105,5 +166,6 @@ def prepare_messages_with_stats(
"saved_chars": max(0, original_chars - sent_chars), "saved_chars": max(0, original_chars - sent_chars),
"compacted_tool_messages": compacted_tool_messages, "compacted_tool_messages": compacted_tool_messages,
"compacted_skill_messages": compacted_skill_messages, "compacted_skill_messages": compacted_skill_messages,
"compacted_tool_call_arguments": compacted_tool_call_arguments,
} }
return prepared, stats return prepared, stats

View File

@ -1,4 +1,5 @@
import unittest import unittest
import json
from core.context import prepare_messages_for_llm, prepare_messages_with_stats from core.context import prepare_messages_for_llm, prepare_messages_with_stats
@ -95,6 +96,60 @@ class ContextCompactionTests(unittest.TestCase):
self.assertGreater(stats["saved_chars"], 0) self.assertGreater(stats["saved_chars"], 0)
self.assertEqual(len(prepared), len(messages)) self.assertEqual(len(prepared), len(messages))
def test_defaults_compact_medium_sized_old_write_arguments(self) -> None:
args = json.dumps({"path": "slides/p01.py", "content": "A" * 1000})
messages = [
{"role": "system", "content": "rules"},
{
"role": "assistant",
"tool_calls": [{
"id": "tc1",
"type": "function",
"function": {"name": "write", "arguments": args},
}],
},
] + [{"role": "user", "content": f"recent {i}"} for i in range(12)]
prepared, stats = prepare_messages_with_stats(messages)
compacted_args = json.loads(prepared[1]["tool_calls"][0]["function"]["arguments"])
self.assertTrue(compacted_args["_compacted"])
self.assertEqual(compacted_args["path"], "slides/p01.py")
self.assertEqual(stats["compacted_tool_call_arguments"], 1)
def test_compacts_old_assistant_tool_call_arguments(self) -> None:
args = json.dumps({"path": "slides/p01.py", "content": "A" * 5000})
messages = [
{"role": "system", "content": "rules"},
{
"role": "assistant",
"content": "writing slide",
"tool_calls": [{
"id": "tc1",
"type": "function",
"function": {"name": "write", "arguments": args},
}],
},
{"role": "tool", "tool_call_id": "tc1", "name": "write", "content": "[wrote file]"},
{"role": "user", "content": "next"},
]
prepared, stats = prepare_messages_with_stats(
messages,
keep_recent=1,
old_tool_arg_chars=200,
)
tc = prepared[1]["tool_calls"][0]
compacted_args = json.loads(tc["function"]["arguments"])
self.assertEqual(tc["id"], "tc1")
self.assertEqual(tc["type"], "function")
self.assertEqual(tc["function"]["name"], "write")
self.assertTrue(compacted_args["_compacted"])
self.assertEqual(compacted_args["path"], "slides/p01.py")
self.assertNotIn("A" * 100, tc["function"]["arguments"])
self.assertEqual(stats["compacted_tool_call_arguments"], 1)
if __name__ == "__main__": if __name__ == "__main__":
unittest.main() unittest.main()