zcspider/docs/superpowers/plans/2026-06-18-analysis-stall.md

118 lines
3.6 KiB
Markdown

# Analysis Stall Fix Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Prevent summary analysis from appearing stuck by bounding URL-check latency, reporting progress, and preventing duplicate analysis threads.
**Architecture:** Keep matching behavior in `mycode/main.py`, add optional progress callbacks, and use `ThreadPoolExecutor` only for independent URL checks. `AnaThread` converts callbacks into existing Qt log signals, while `MainWindow` owns button state and thread lifecycle.
**Tech Stack:** Python 3.8, `concurrent.futures`, pandas, PySide6, unittest
---
### Task 1: Concurrent Deleted-Article Checks
**Files:**
- Modify: `tests/test_main.py`
- Modify: `mycode/main.py`
- [ ] **Step 1: Write failing tests**
Add tests proving duplicate URLs are fetched once, request failures retain rows,
and the progress callback reaches `(total, total)`.
- [ ] **Step 2: Verify tests fail**
Run:
```powershell
.\runtime\python.exe -m unittest tests.test_main.DeletedWechatContentFilterTest -v
```
Expected: failure because `filter_deleted_wechat_rows` does not accept
`progress_callback` or bounded concurrency options.
- [ ] **Step 3: Implement bounded concurrency**
Use `ThreadPoolExecutor(max_workers=8)` and `as_completed`. Submit one task per
unique non-empty URL, preserve original row order, retain rows for failed
requests, and invoke `progress_callback(completed, total)` after each completed
URL.
- [ ] **Step 4: Verify tests pass**
Run the Task 1 unittest command and expect all deleted-content tests to pass.
### Task 2: Rule-Scan Progress
**Files:**
- Modify: `tests/test_main.py`
- Modify: `mycode/main.py`
- [ ] **Step 1: Write a failing test**
Add a focused test using small injected data frames to prove `ana_wechat`
reports final rule progress without network access.
- [ ] **Step 2: Verify the test fails**
Run the new test directly and expect failure because `ana_wechat` has no
progress callback or injectable data frames.
- [ ] **Step 3: Implement progress callbacks**
Add optional `progress_callback`, `rules_df`, and `articles_df` parameters.
Report `(completed_rules, total_rules)` after each rule and pass a separate URL
progress callback to `filter_deleted_wechat_rows`. Preserve default production
behavior when parameters are omitted.
- [ ] **Step 4: Verify the test passes**
Run the focused test and the full `tests.test_main` module.
### Task 3: Qt Lifecycle and User Feedback
**Files:**
- Modify: `start.py`
- [ ] **Step 1: Wire progress messages**
In `AnaThread`, emit periodic messages for rule scanning and article-link
checking, including completed and total counts.
- [ ] **Step 2: Prevent duplicate runs**
In `MainWindow.start_ana`, return early when the current analysis thread is
running, disable `bAna` before starting, connect `finished` to a cleanup method,
and restore the button in cleanup.
- [ ] **Step 3: Verify syntax and imports**
Run:
```powershell
.\runtime\python.exe -m py_compile start.py mycode\main.py tests\test_main.py
```
Expected: exit code 0.
### Task 4: Final Verification
**Files:**
- Verify: `tests/test_main.py`
- Verify: `start.py`
- Verify: `mycode/main.py`
- [ ] **Step 1: Run all tests**
```powershell
.\runtime\python.exe -m unittest discover -s tests -v
```
Expected: all tests pass.
- [ ] **Step 2: Review the diff**
Confirm the diff contains only the planned analysis behavior, tests, and
documentation, and does not touch `col.bat` or `mycode/main2.py`.