# paper_server ## What This Repo Is `paper_server` is a Django 4.2 backend that mixes a general admin platform with a paper/resource acquisition pipeline. Main stack: - Django + DRF - PostgreSQL - Redis cache - Celery + django-celery-beat + django-celery-results - Channels + Daphne for WebSocket push The repo is not just a "paper service". It contains four major business areas: - `apps.system`: users, departments, roles, permissions, files, schedules, config - `apps.auth1`: login/auth flows based on JWT, session, SMS, WeChat, face login - `apps.wf`: a configurable workflow/ticket engine - `apps.ops`: ops endpoints for logs, backups, server metrics, cache, Celery, Redis - `apps.resm`: paper metadata, abstract/fulltext fetch, PDF download pipeline - `apps.utils`: shared base models, viewsets, permissions, middleware, pagination, helpers - `apps.ws`: websocket consumers and routing ## Runtime Entry Points - `manage.py` starts Django with `server.settings` - `server/settings.py` is the central settings file and imports environment values from `config/conf.py` - `server/urls.py` mounts all REST APIs, Swagger, Django admin, and the SPA entry (`dist/index.html`) - `server/asgi.py` serves HTTP plus WebSocket traffic - `server/celery.py` creates the Celery app using `config.conf.BASE_PROJECT_CODE` ## Environment And Config This project expects local runtime config files under `config/`: - `config/conf.py`: Django secret/config, database, cache, Celery broker, backup shell paths - `config/conf.json`: runtime system config loaded through `server.settings.get_sysconfig()` Important implication: - the repo can start only when `config/conf.py` is valid for the target environment - many ops tasks assume Linux paths from `BACKUP_PATH` and `SH_PATH` - Redis is used by cache, Celery broker, and Channels ## URL Map Primary REST prefixes: - `api/auth/` - `api/system/` - `api/wf/` - `api/ops/` - `api/resm/` - `api/utils/` Other routes: - `api/swagger/` and `api/redoc/` - `django/admin/` - `ws/my/` - `ws//` ## Core Architectural Patterns ### Shared Base Models `apps.utils.models` defines the core model layer: - `BaseModel`: string primary key generated by Snowflake-style `idWorker` - `SoftModel`: soft delete support - `CommonAModel` / `CommonBModel`: standard audit fields - `ParentModel`: tree-like parent linkage with a stored `parent_link` Many business models inherit from these classes, so ID generation, soft deletion, and audit fields are cross-cutting behavior. ### Shared ViewSet Base `apps.utils.viewsets.CustomGenericViewSet` is the main DRF base class. It adds: - permission code registration through `perms_map` - per-user/request cache protection for duplicate requests - data-scope filtering based on RBAC and department range - serializer switching per action - `select_related` / `prefetch_related` hooks - row locking behavior for mutable operations inside transactions When adding endpoints, this class is usually the first place to check for inherited behavior. ### Auth And Permissions - default auth uses JWT plus DRF basic/session fallbacks - global default permission is authenticated + `apps.utils.permission.RbacPermission` - custom user model is `apps.system.models.User` - websocket auth is handled in `apps.utils.middlewares.TokenAuthMiddleware` via `token` query param ## App Notes ### `apps.system` This is the platform foundation layer. Key models: - `User` - `Dept` - `Role` - `Permission` - `Post` / `UserPost` / `PostRole` - `Dictionary` / `DictType` - `File` - `MySchedule` This app owns the RBAC structure used by the rest of the project. ### `apps.wf` This app is a full workflow engine, not just a simple approval table. Key models: - `Workflow` - `State` - `Transition` - `CustomField` - `Ticket` - `TicketFlow` Important logic lives in `apps.wf.services.WfService`: - initialize a workflow from its start state - generate ticket serial numbers - resolve next state from transition conditions - resolve participants from person/role/dept/post/field/code/robot - enforce handle permissions - create transition logs - send SMS notifications - trigger robot tasks and on-reach hooks When working on ticket behavior, read `apps/wf/services.py` before touching serializers or views. ### `apps.ops` This app exposes runtime/maintenance APIs: - git reload tasks - database/media backup - log browsing - CPU/memory/disk inspection - Celery info - Redis info - cache get/set - DRF request log and third-party request log listing Some behaviors depend on shell scripts and Linux-only paths from config. ### `apps.resm` This is the paper pipeline. Key model: - `Paper`: stores DOI/OpenAlex metadata, OA flags, abstract/fulltext state, fetch status, failure reason, and local file save helpers - `PaperAbstract`: separate abstract storage The paper fetch pipeline in `apps/resm/tasks.py` currently includes: - metadata ingestion from OpenAlex - abstract/fulltext XML fetch from Elsevier - PDF fetch from OA URL - PDF fetch from OpenAlex content API - PDF fetch from Elsevier - Sci-Hub fallback - task fan-out and stuck-download release Download behavior is stateful: - `fetch_status="downloading"` is used as a coarse lock - `fail_reason` accumulates fetch failures - files are stored under `media/papers////` This app has recent local edits in the working tree, so read carefully before changing it. ### `apps.ws` Two websocket patterns exist: - `MyConsumer`: per-user channel (`user_`) plus optional `event` group - `RoomConsumer`: shared room chat group The websocket layer depends on Redis-backed Channels and JWT token parsing in the query string. ## Startup Expectations Typical local boot sequence: 1. Ensure `config/conf.py` and `config/conf.json` are present and valid. 2. Start PostgreSQL and Redis. 3. Install dependencies from `requirements.txt`. 4. Run `python manage.py migrate`. 5. Optionally run `python manage.py loaddata db.json`. 6. Start Django/Daphne. 7. Start Celery worker/beat separately if async tasks are needed. ## Important Caveats - The repo currently has uncommitted user changes, especially under `apps/resm/`; do not revert them casually. - `config/conf.py` contains environment-specific secrets and infrastructure paths; treat edits there as deployment-sensitive. - Some source files display mojibake in this terminal because the project contains non-UTF8/legacy encoded Chinese comments, but the Python logic is still readable. - `TokenAuthMiddleware` only proceeds when a token is present; websocket behavior without token is intentionally limited. - `apps/resm/tasks.py` currently contains hard-coded third-party API credentials and source-specific logic; changing it needs extra caution. ## Good First Files To Read - `server/settings.py` - `server/urls.py` - `apps/utils/models.py` - `apps/utils/viewsets.py` - `apps/system/models.py` - `apps/wf/models.py` - `apps/wf/services.py` - `apps/resm/models.py` - `apps/resm/tasks.py` ## Updating This File Update `CLAUDE.md` when any of these change: - startup/config entry points - app/module boundaries - workflow engine behavior - paper download pipeline behavior - shared base classes or permission patterns