paper_server/CLAUDE.md

234 lines
7.1 KiB
Markdown

# paper_server
## What This Repo Is
`paper_server` is a Django 4.2 backend that mixes a general admin platform with a paper/resource acquisition pipeline.
Main stack:
- Django + DRF
- PostgreSQL
- Redis cache
- Celery + django-celery-beat + django-celery-results
- Channels + Daphne for WebSocket push
The repo is not just a "paper service". It contains four major business areas:
- `apps.system`: users, departments, roles, permissions, files, schedules, config
- `apps.auth1`: login/auth flows based on JWT, session, SMS, WeChat, face login
- `apps.wf`: a configurable workflow/ticket engine
- `apps.ops`: ops endpoints for logs, backups, server metrics, cache, Celery, Redis
- `apps.resm`: paper metadata, abstract/fulltext fetch, PDF download pipeline
- `apps.utils`: shared base models, viewsets, permissions, middleware, pagination, helpers
- `apps.ws`: websocket consumers and routing
## Runtime Entry Points
- `manage.py` starts Django with `server.settings`
- `server/settings.py` is the central settings file and imports environment values from `config/conf.py`
- `server/urls.py` mounts all REST APIs, Swagger, Django admin, and the SPA entry (`dist/index.html`)
- `server/asgi.py` serves HTTP plus WebSocket traffic
- `server/celery.py` creates the Celery app using `config.conf.BASE_PROJECT_CODE`
## Environment And Config
This project expects local runtime config files under `config/`:
- `config/conf.py`: Django secret/config, database, cache, Celery broker, backup shell paths
- `config/conf.json`: runtime system config loaded through `server.settings.get_sysconfig()`
Important implication:
- the repo can start only when `config/conf.py` is valid for the target environment
- many ops tasks assume Linux paths from `BACKUP_PATH` and `SH_PATH`
- Redis is used by cache, Celery broker, and Channels
## URL Map
Primary REST prefixes:
- `api/auth/`
- `api/system/`
- `api/wf/`
- `api/ops/`
- `api/resm/`
- `api/utils/`
Other routes:
- `api/swagger/` and `api/redoc/`
- `django/admin/`
- `ws/my/`
- `ws/<room_name>/`
## Core Architectural Patterns
### Shared Base Models
`apps.utils.models` defines the core model layer:
- `BaseModel`: string primary key generated by Snowflake-style `idWorker`
- `SoftModel`: soft delete support
- `CommonAModel` / `CommonBModel`: standard audit fields
- `ParentModel`: tree-like parent linkage with a stored `parent_link`
Many business models inherit from these classes, so ID generation, soft deletion, and audit fields are cross-cutting behavior.
### Shared ViewSet Base
`apps.utils.viewsets.CustomGenericViewSet` is the main DRF base class. It adds:
- permission code registration through `perms_map`
- per-user/request cache protection for duplicate requests
- data-scope filtering based on RBAC and department range
- serializer switching per action
- `select_related` / `prefetch_related` hooks
- row locking behavior for mutable operations inside transactions
When adding endpoints, this class is usually the first place to check for inherited behavior.
### Auth And Permissions
- default auth uses JWT plus DRF basic/session fallbacks
- global default permission is authenticated + `apps.utils.permission.RbacPermission`
- custom user model is `apps.system.models.User`
- websocket auth is handled in `apps.utils.middlewares.TokenAuthMiddleware` via `token` query param
## App Notes
### `apps.system`
This is the platform foundation layer.
Key models:
- `User`
- `Dept`
- `Role`
- `Permission`
- `Post` / `UserPost` / `PostRole`
- `Dictionary` / `DictType`
- `File`
- `MySchedule`
This app owns the RBAC structure used by the rest of the project.
### `apps.wf`
This app is a full workflow engine, not just a simple approval table.
Key models:
- `Workflow`
- `State`
- `Transition`
- `CustomField`
- `Ticket`
- `TicketFlow`
Important logic lives in `apps.wf.services.WfService`:
- initialize a workflow from its start state
- generate ticket serial numbers
- resolve next state from transition conditions
- resolve participants from person/role/dept/post/field/code/robot
- enforce handle permissions
- create transition logs
- send SMS notifications
- trigger robot tasks and on-reach hooks
When working on ticket behavior, read `apps/wf/services.py` before touching serializers or views.
### `apps.ops`
This app exposes runtime/maintenance APIs:
- git reload tasks
- database/media backup
- log browsing
- CPU/memory/disk inspection
- Celery info
- Redis info
- cache get/set
- DRF request log and third-party request log listing
Some behaviors depend on shell scripts and Linux-only paths from config.
### `apps.resm`
This is the paper pipeline.
Key model:
- `Paper`: stores DOI/OpenAlex metadata, OA flags, abstract/fulltext state, fetch status, failure reason, and local file save helpers
- `PaperAbstract`: separate abstract storage
The paper fetch pipeline in `apps/resm/tasks.py` currently includes:
- metadata ingestion from OpenAlex
- abstract/fulltext XML fetch from Elsevier
- PDF fetch from OA URL
- PDF fetch from OpenAlex content API
- PDF fetch from Elsevier
- Sci-Hub fallback
- task fan-out and stuck-download release
Download behavior is stateful:
- `fetch_status="downloading"` is used as a coarse lock
- `fail_reason` accumulates fetch failures
- files are stored under `media/papers/<year>/<month>/<day>/`
This app has recent local edits in the working tree, so read carefully before changing it.
### `apps.ws`
Two websocket patterns exist:
- `MyConsumer`: per-user channel (`user_<id>`) plus optional `event` group
- `RoomConsumer`: shared room chat group
The websocket layer depends on Redis-backed Channels and JWT token parsing in the query string.
## Startup Expectations
Typical local boot sequence:
1. Ensure `config/conf.py` and `config/conf.json` are present and valid.
2. Start PostgreSQL and Redis.
3. Install dependencies from `requirements.txt`.
4. Run `python manage.py migrate`.
5. Optionally run `python manage.py loaddata db.json`.
6. Start Django/Daphne.
7. Start Celery worker/beat separately if async tasks are needed.
## Important Caveats
- The repo currently has uncommitted user changes, especially under `apps/resm/`; do not revert them casually.
- `config/conf.py` contains environment-specific secrets and infrastructure paths; treat edits there as deployment-sensitive.
- Some source files display mojibake in this terminal because the project contains non-UTF8/legacy encoded Chinese comments, but the Python logic is still readable.
- `TokenAuthMiddleware` only proceeds when a token is present; websocket behavior without token is intentionally limited.
- `apps/resm/tasks.py` currently contains hard-coded third-party API credentials and source-specific logic; changing it needs extra caution.
## Good First Files To Read
- `server/settings.py`
- `server/urls.py`
- `apps/utils/models.py`
- `apps/utils/viewsets.py`
- `apps/system/models.py`
- `apps/wf/models.py`
- `apps/wf/services.py`
- `apps/resm/models.py`
- `apps/resm/tasks.py`
## Updating This File
Update `CLAUDE.md` when any of these change:
- startup/config entry points
- app/module boundaries
- workflow engine behavior
- paper download pipeline behavior
- shared base classes or permission patterns