paper_server/CLAUDE.md

7.1 KiB

paper_server

What This Repo Is

paper_server is a Django 4.2 backend that mixes a general admin platform with a paper/resource acquisition pipeline.

Main stack:

  • Django + DRF
  • PostgreSQL
  • Redis cache
  • Celery + django-celery-beat + django-celery-results
  • Channels + Daphne for WebSocket push

The repo is not just a "paper service". It contains four major business areas:

  • apps.system: users, departments, roles, permissions, files, schedules, config
  • apps.auth1: login/auth flows based on JWT, session, SMS, WeChat, face login
  • apps.wf: a configurable workflow/ticket engine
  • apps.ops: ops endpoints for logs, backups, server metrics, cache, Celery, Redis
  • apps.resm: paper metadata, abstract/fulltext fetch, PDF download pipeline
  • apps.utils: shared base models, viewsets, permissions, middleware, pagination, helpers
  • apps.ws: websocket consumers and routing

Runtime Entry Points

  • manage.py starts Django with server.settings
  • server/settings.py is the central settings file and imports environment values from config/conf.py
  • server/urls.py mounts all REST APIs, Swagger, Django admin, and the SPA entry (dist/index.html)
  • server/asgi.py serves HTTP plus WebSocket traffic
  • server/celery.py creates the Celery app using config.conf.BASE_PROJECT_CODE

Environment And Config

This project expects local runtime config files under config/:

  • config/conf.py: Django secret/config, database, cache, Celery broker, backup shell paths
  • config/conf.json: runtime system config loaded through server.settings.get_sysconfig()

Important implication:

  • the repo can start only when config/conf.py is valid for the target environment
  • many ops tasks assume Linux paths from BACKUP_PATH and SH_PATH
  • Redis is used by cache, Celery broker, and Channels

URL Map

Primary REST prefixes:

  • api/auth/
  • api/system/
  • api/wf/
  • api/ops/
  • api/resm/
  • api/utils/

Other routes:

  • api/swagger/ and api/redoc/
  • django/admin/
  • ws/my/
  • ws/<room_name>/

Core Architectural Patterns

Shared Base Models

apps.utils.models defines the core model layer:

  • BaseModel: string primary key generated by Snowflake-style idWorker
  • SoftModel: soft delete support
  • CommonAModel / CommonBModel: standard audit fields
  • ParentModel: tree-like parent linkage with a stored parent_link

Many business models inherit from these classes, so ID generation, soft deletion, and audit fields are cross-cutting behavior.

Shared ViewSet Base

apps.utils.viewsets.CustomGenericViewSet is the main DRF base class. It adds:

  • permission code registration through perms_map
  • per-user/request cache protection for duplicate requests
  • data-scope filtering based on RBAC and department range
  • serializer switching per action
  • select_related / prefetch_related hooks
  • row locking behavior for mutable operations inside transactions

When adding endpoints, this class is usually the first place to check for inherited behavior.

Auth And Permissions

  • default auth uses JWT plus DRF basic/session fallbacks
  • global default permission is authenticated + apps.utils.permission.RbacPermission
  • custom user model is apps.system.models.User
  • websocket auth is handled in apps.utils.middlewares.TokenAuthMiddleware via token query param

App Notes

apps.system

This is the platform foundation layer.

Key models:

  • User
  • Dept
  • Role
  • Permission
  • Post / UserPost / PostRole
  • Dictionary / DictType
  • File
  • MySchedule

This app owns the RBAC structure used by the rest of the project.

apps.wf

This app is a full workflow engine, not just a simple approval table.

Key models:

  • Workflow
  • State
  • Transition
  • CustomField
  • Ticket
  • TicketFlow

Important logic lives in apps.wf.services.WfService:

  • initialize a workflow from its start state
  • generate ticket serial numbers
  • resolve next state from transition conditions
  • resolve participants from person/role/dept/post/field/code/robot
  • enforce handle permissions
  • create transition logs
  • send SMS notifications
  • trigger robot tasks and on-reach hooks

When working on ticket behavior, read apps/wf/services.py before touching serializers or views.

apps.ops

This app exposes runtime/maintenance APIs:

  • git reload tasks
  • database/media backup
  • log browsing
  • CPU/memory/disk inspection
  • Celery info
  • Redis info
  • cache get/set
  • DRF request log and third-party request log listing

Some behaviors depend on shell scripts and Linux-only paths from config.

apps.resm

This is the paper pipeline.

Key model:

  • Paper: stores DOI/OpenAlex metadata, OA flags, abstract/fulltext state, fetch status, failure reason, and local file save helpers
  • PaperAbstract: separate abstract storage

The paper fetch pipeline in apps/resm/tasks.py currently includes:

  • metadata ingestion from OpenAlex
  • abstract/fulltext XML fetch from Elsevier
  • PDF fetch from OA URL
  • PDF fetch from OpenAlex content API
  • PDF fetch from Elsevier
  • Sci-Hub fallback
  • task fan-out and stuck-download release

Download behavior is stateful:

  • fetch_status="downloading" is used as a coarse lock
  • fail_reason accumulates fetch failures
  • files are stored under media/papers/<year>/<month>/<day>/

This app has recent local edits in the working tree, so read carefully before changing it.

apps.ws

Two websocket patterns exist:

  • MyConsumer: per-user channel (user_<id>) plus optional event group
  • RoomConsumer: shared room chat group

The websocket layer depends on Redis-backed Channels and JWT token parsing in the query string.

Startup Expectations

Typical local boot sequence:

  1. Ensure config/conf.py and config/conf.json are present and valid.
  2. Start PostgreSQL and Redis.
  3. Install dependencies from requirements.txt.
  4. Run python manage.py migrate.
  5. Optionally run python manage.py loaddata db.json.
  6. Start Django/Daphne.
  7. Start Celery worker/beat separately if async tasks are needed.

Important Caveats

  • The repo currently has uncommitted user changes, especially under apps/resm/; do not revert them casually.
  • config/conf.py contains environment-specific secrets and infrastructure paths; treat edits there as deployment-sensitive.
  • Some source files display mojibake in this terminal because the project contains non-UTF8/legacy encoded Chinese comments, but the Python logic is still readable.
  • TokenAuthMiddleware only proceeds when a token is present; websocket behavior without token is intentionally limited.
  • apps/resm/tasks.py currently contains hard-coded third-party API credentials and source-specific logic; changing it needs extra caution.

Good First Files To Read

  • server/settings.py
  • server/urls.py
  • apps/utils/models.py
  • apps/utils/viewsets.py
  • apps/system/models.py
  • apps/wf/models.py
  • apps/wf/services.py
  • apps/resm/models.py
  • apps/resm/tasks.py

Updating This File

Update CLAUDE.md when any of these change:

  • startup/config entry points
  • app/module boundaries
  • workflow engine behavior
  • paper download pipeline behavior
  • shared base classes or permission patterns