7.1 KiB
paper_server
What This Repo Is
paper_server is a Django 4.2 backend that mixes a general admin platform with a paper/resource acquisition pipeline.
Main stack:
- Django + DRF
- PostgreSQL
- Redis cache
- Celery + django-celery-beat + django-celery-results
- Channels + Daphne for WebSocket push
The repo is not just a "paper service". It contains four major business areas:
apps.system: users, departments, roles, permissions, files, schedules, configapps.auth1: login/auth flows based on JWT, session, SMS, WeChat, face loginapps.wf: a configurable workflow/ticket engineapps.ops: ops endpoints for logs, backups, server metrics, cache, Celery, Redisapps.resm: paper metadata, abstract/fulltext fetch, PDF download pipelineapps.utils: shared base models, viewsets, permissions, middleware, pagination, helpersapps.ws: websocket consumers and routing
Runtime Entry Points
manage.pystarts Django withserver.settingsserver/settings.pyis the central settings file and imports environment values fromconfig/conf.pyserver/urls.pymounts all REST APIs, Swagger, Django admin, and the SPA entry (dist/index.html)server/asgi.pyserves HTTP plus WebSocket trafficserver/celery.pycreates the Celery app usingconfig.conf.BASE_PROJECT_CODE
Environment And Config
This project expects local runtime config files under config/:
config/conf.py: Django secret/config, database, cache, Celery broker, backup shell pathsconfig/conf.json: runtime system config loaded throughserver.settings.get_sysconfig()
Important implication:
- the repo can start only when
config/conf.pyis valid for the target environment - many ops tasks assume Linux paths from
BACKUP_PATHandSH_PATH - Redis is used by cache, Celery broker, and Channels
URL Map
Primary REST prefixes:
api/auth/api/system/api/wf/api/ops/api/resm/api/utils/
Other routes:
api/swagger/andapi/redoc/django/admin/ws/my/ws/<room_name>/
Core Architectural Patterns
Shared Base Models
apps.utils.models defines the core model layer:
BaseModel: string primary key generated by Snowflake-styleidWorkerSoftModel: soft delete supportCommonAModel/CommonBModel: standard audit fieldsParentModel: tree-like parent linkage with a storedparent_link
Many business models inherit from these classes, so ID generation, soft deletion, and audit fields are cross-cutting behavior.
Shared ViewSet Base
apps.utils.viewsets.CustomGenericViewSet is the main DRF base class. It adds:
- permission code registration through
perms_map - per-user/request cache protection for duplicate requests
- data-scope filtering based on RBAC and department range
- serializer switching per action
select_related/prefetch_relatedhooks- row locking behavior for mutable operations inside transactions
When adding endpoints, this class is usually the first place to check for inherited behavior.
Auth And Permissions
- default auth uses JWT plus DRF basic/session fallbacks
- global default permission is authenticated +
apps.utils.permission.RbacPermission - custom user model is
apps.system.models.User - websocket auth is handled in
apps.utils.middlewares.TokenAuthMiddlewareviatokenquery param
App Notes
apps.system
This is the platform foundation layer.
Key models:
UserDeptRolePermissionPost/UserPost/PostRoleDictionary/DictTypeFileMySchedule
This app owns the RBAC structure used by the rest of the project.
apps.wf
This app is a full workflow engine, not just a simple approval table.
Key models:
WorkflowStateTransitionCustomFieldTicketTicketFlow
Important logic lives in apps.wf.services.WfService:
- initialize a workflow from its start state
- generate ticket serial numbers
- resolve next state from transition conditions
- resolve participants from person/role/dept/post/field/code/robot
- enforce handle permissions
- create transition logs
- send SMS notifications
- trigger robot tasks and on-reach hooks
When working on ticket behavior, read apps/wf/services.py before touching serializers or views.
apps.ops
This app exposes runtime/maintenance APIs:
- git reload tasks
- database/media backup
- log browsing
- CPU/memory/disk inspection
- Celery info
- Redis info
- cache get/set
- DRF request log and third-party request log listing
Some behaviors depend on shell scripts and Linux-only paths from config.
apps.resm
This is the paper pipeline.
Key model:
Paper: stores DOI/OpenAlex metadata, OA flags, abstract/fulltext state, fetch status, failure reason, and local file save helpersPaperAbstract: separate abstract storage
The paper fetch pipeline in apps/resm/tasks.py currently includes:
- metadata ingestion from OpenAlex
- abstract/fulltext XML fetch from Elsevier
- PDF fetch from OA URL
- PDF fetch from OpenAlex content API
- PDF fetch from Elsevier
- Sci-Hub fallback
- task fan-out and stuck-download release
Download behavior is stateful:
fetch_status="downloading"is used as a coarse lockfail_reasonaccumulates fetch failures- files are stored under
media/papers/<year>/<month>/<day>/
This app has recent local edits in the working tree, so read carefully before changing it.
apps.ws
Two websocket patterns exist:
MyConsumer: per-user channel (user_<id>) plus optionaleventgroupRoomConsumer: shared room chat group
The websocket layer depends on Redis-backed Channels and JWT token parsing in the query string.
Startup Expectations
Typical local boot sequence:
- Ensure
config/conf.pyandconfig/conf.jsonare present and valid. - Start PostgreSQL and Redis.
- Install dependencies from
requirements.txt. - Run
python manage.py migrate. - Optionally run
python manage.py loaddata db.json. - Start Django/Daphne.
- Start Celery worker/beat separately if async tasks are needed.
Important Caveats
- The repo currently has uncommitted user changes, especially under
apps/resm/; do not revert them casually. config/conf.pycontains environment-specific secrets and infrastructure paths; treat edits there as deployment-sensitive.- Some source files display mojibake in this terminal because the project contains non-UTF8/legacy encoded Chinese comments, but the Python logic is still readable.
TokenAuthMiddlewareonly proceeds when a token is present; websocket behavior without token is intentionally limited.apps/resm/tasks.pycurrently contains hard-coded third-party API credentials and source-specific logic; changing it needs extra caution.
Good First Files To Read
server/settings.pyserver/urls.pyapps/utils/models.pyapps/utils/viewsets.pyapps/system/models.pyapps/wf/models.pyapps/wf/services.pyapps/resm/models.pyapps/resm/tasks.py
Updating This File
Update CLAUDE.md when any of these change:
- startup/config entry points
- app/module boundaries
- workflow engine behavior
- paper download pipeline behavior
- shared base classes or permission patterns