plc_control/docs/superpowers/plans/2026-03-24-control-engine.md

11 KiB

Control Engine Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Implement the automated control engine for coal feeder / distributor units, including state machine, fault/comm protection, runtime API and frontend control panels.

Architecture: The engine spawns one async task per enabled unit (supervised by a 10s scanner). Each task drives the unit's state machine using tokio::time::sleep_until for phase timing and tokio::sync::Notify for instant wake-up when external state changes (auto enable/disable, fault ack). A 500ms fault-poll ticker runs inside each task's wait_phase helper so fault/comm status is still checked promptly during long phases. State is kept in ControlRuntimeStore (in-memory, never persisted). Frontend receives real-time updates via WsMessage::UnitRuntimeChanged — pushed only on state transitions, not every tick.

Tech Stack: Rust/Axum backend, sqlx/PostgreSQL, tokio async, vanilla JS ES modules frontend.


File Map

File Action Responsibility
src/control/runtime.rs Done UnitRuntime struct + ControlRuntimeStore with Notify per unit
src/control/command.rs Done Shared send_pulse_command() helper
src/control/engine.rs Done Supervisor + per-unit async tasks + wait_phase
src/control/validator.rs Done Block manual commands when unit is fault/comm locked
src/control/mod.rs Done Exposes command, engine, runtime, validator
src/event.rs Done 7 AppEvent variants; UnitStateChanged fires but is not persisted to DB
src/websocket.rs Done WsMessage::UnitRuntimeChanged
src/service/control.rs Done get_all_enabled_units, get_equipment_by_unit_id
src/handler/control.rs Done start_auto, stop_auto, batch_start_auto, batch_stop_auto, ack_fault, get_unit_runtime; calls notify_unit after every state change
src/main.rs Done Routes for above endpoints
web/js/state.js Done runtimes: new Map()
web/js/units.js Done Runtime state badge, Auto Start/Stop, Ack Fault; shows display_acc_sec
web/js/ops.js Done Ops panel unit cards show runtime badge + display_acc_sec
web/js/app.js Done Handles UnitRuntimeChanged WS message
web/styles.css Done .event-card { flex-shrink: 0 } prevents text overlap under flex column

Current UnitRuntime Shape

// src/control/runtime.rs
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
pub struct UnitRuntime {
    pub unit_id: Uuid,
    pub state: UnitRuntimeState,
    pub auto_enabled: bool,
    pub accumulated_run_sec: i64,   // internal accumulator (ms); do NOT display directly
    pub display_acc_sec: i64,       // snapshot at state-transition; use this for display
    pub fault_locked: bool,
    pub flt_active: bool,
    pub comm_locked: bool,
    pub manual_ack_required: bool,
}
// NOTE: elapsed-time fields (current_run_elapsed_sec, current_stop_elapsed_sec,
//       distributor_run_elapsed_sec, last_tick_at) were removed in the event-driven
//       refactor. Timing is now managed entirely by tokio::time::sleep_until inside
//       the per-unit task. Do not re-add them.

ControlRuntimeStore adds:

notifiers: Arc<RwLock<HashMap<Uuid, Arc<Notify>>>>,

// Methods:
pub async fn get_or_create_notify(&self, unit_id: Uuid) -> Arc<Notify>
pub async fn notify_unit(&self, unit_id: Uuid)  // call from handlers after state changes

Engine Architecture (event-driven, 2026-03-26)

start()
 └─ supervise()          — interval 10s, spawns unit_task per enabled unit

unit_task(unit_id)
 ├─ load_equipment_maps  — once at task start (cached for task lifetime)
 ├─ fault_tick           — interval 500ms, used inside wait_phase
 └─ loop:
     ├─ reload unit config (check still enabled)
     ├─ check_fault_comm → push WS if changed
     ├─ if !auto || fault || comm → select!(fault_tick | notify), continue
     └─ match state:
         Stopped          → wait_phase(stop_time_sec) → start feeder → state=Running → push WS
         Running          → wait_phase(run_time_sec)  → stop feeder → acc += run_time_sec
                            → if acc >= acc_time_sec: start distributor, state=DistributorRunning
                            → else: state=Stopped → push WS
         DistributorRunning → wait_phase(bl_time_sec) → stop distributor → acc=0 → state=Stopped → push WS
         FaultLocked|CommLocked → select!(fault_tick | notify)

wait_phase(secs):
  deadline = now + secs
  loop:
    select! { sleep_until(deadline) => return true
              fault_tick.tick()     => re-check fault/comm; if interrupted return false
              notify.notified()     => re-check fault/comm; if interrupted return false }

Key invariants:

  • accumulated_run_sec is updated by exactly run_time_sec * 1000 per completed cycle (no delta drift).
  • display_acc_sec is a snapshot copied from accumulated_run_sec only at Running→Stopped or Running→DistributorRunning transitions. Frontend always reads display_acc_sec.
  • WS is pushed only when something changes. No periodic push.
  • unit.state_changed events are fired (for logging) but not written to the DB event table (too frequent).

Task 1: Extend UnitRuntime — DONE

Files: src/control/runtime.rs

Fields as shown in "Current UnitRuntime Shape" above. ControlRuntimeStore includes the notifiers map with get_or_create_notify and notify_unit methods.


Task 2: Create shared pulse-command helper — DONE

Files: src/control/command.rs, src/control/mod.rs, src/handler/control.rs

send_pulse_command(connection_manager, point_id, value_type, pulse_ms) writes high→delay→low. simulate_run_feedback(state, eq_id, running) writes a fake run-feedback value in simulate mode.


Task 3: Add runtime-state checks to validator.rs — DONE

Files: src/control/validator.rs

After existing REM/FLT/quality checks in validate_manual_control:

if let Some(unit_id) = equipment.unit_id {
    if let Some(runtime) = state.control_runtime.get(unit_id).await {
        if runtime.auto_enabled {
            return Err(ApiErr::Forbidden("Auto control is active; disable auto first", ...));
        }
        if runtime.comm_locked {
            return Err(ApiErr::Forbidden("Unit communication is locked", ...));
        }
        if runtime.fault_locked {
            return Err(ApiErr::Forbidden("Unit is fault locked", ...));
        }
    }
}

Task 4: Extend AppEvent with business events — DONE

Files: src/event.rs

7 variants added:

AutoControlStarted { unit_id: Uuid },
AutoControlStopped { unit_id: Uuid },
FaultLocked { unit_id: Uuid, equipment_id: Uuid },
FaultAcked { unit_id: Uuid },
CommLocked { unit_id: Uuid },
CommRecovered { unit_id: Uuid },
UnitStateChanged { unit_id: Uuid, from_state: String, to_state: String },

persist_event_if_needed mapping:

Variant DB? event_type
AutoControlStarted unit.auto_control_started
AutoControlStopped unit.auto_control_stopped
FaultLocked unit.fault_locked (level: error)
FaultAcked unit.fault_acked
CommLocked unit.comm_locked (level: warn)
CommRecovered unit.comm_recovered
UnitStateChanged — (too frequent; fires every cycle)

Task 5: Add WsMessage::UnitRuntimeChanged — DONE

Files: src/websocket.rs

UnitRuntimeChanged(crate::control::runtime::UnitRuntime),

Task 6: Add service helpers — DONE

Files: src/service/control.rs

pub async fn get_all_enabled_units(pool: &PgPool) -> Result<Vec<ControlUnit>, sqlx::Error>
pub async fn get_equipment_by_unit_id(pool: &PgPool, unit_id: Uuid) -> Result<Vec<Equipment>, sqlx::Error>

Task 7: Implement control/engine.rs — DONE (event-driven, 2026-03-26)

Files: src/control/engine.rs

See "Engine Architecture" section above for the full design.

Critical rules for future modifications:

  • Never push WsMessage::UnitRuntimeChanged except at state transitions or fault/comm changes.
  • wait_phase must use sleep_until(deadline) not sleep(duration) — the deadline is fixed when the phase starts so that fault-tick re-checks don't restart the timer.
  • When handling notify.notified() inside wait_phase, always re-read runtime from store (the handler may have changed auto_enabled).
  • Equipment maps are loaded once per task invocation; if equipment config changes, the supervisor will restart the task on its next scan (≤10s delay).

Task 8: New API endpoints — DONE

Files: src/handler/control.rs, src/main.rs

Method Path Handler
POST /api/unit/:id/start-auto start_auto_unit
POST /api/unit/:id/stop-auto stop_auto_unit
POST /api/unit/:id/ack-fault ack_fault_unit
POST /api/unit/batch-start-auto batch_start_auto
POST /api/unit/batch-stop-auto batch_stop_auto
GET /api/unit/:id/runtime get_unit_runtime

Notify contract: every handler that modifies auto_enabled or fault_locked MUST call state.control_runtime.notify_unit(unit_id).await after upserting the runtime. This wakes the sleeping unit task immediately.

// Pattern to follow in every auto/fault handler:
state.control_runtime.upsert(runtime).await;
state.control_runtime.notify_unit(unit_id).await;  // ← must not be omitted
let _ = state.event_manager.send(AppEvent::...);

Task 9: Frontend runtime integration — DONE

Files: web/js/state.js, web/js/units.js, web/js/ops.js, web/js/app.js

WS handler in app.js:

case "UnitRuntimeChanged":
    state.runtimes.set(payload.data.unit_id, payload.data);
    renderUnits();   // re-renders unit cards with new badge/buttons
    renderOpsUnits();
    break;

Display rule: Always use runtime.display_acc_sec for Acc display, never runtime.accumulated_run_sec.

// ✅ Correct
`Acc ${Math.floor(runtime.display_acc_sec / 1000)}s`

// ❌ Wrong — shows mid-cycle jitter values
`Acc ${Math.floor(runtime.accumulated_run_sec / 1000)}s`

Event list CSS: .event-card must have flex-shrink: 0 (in web/styles.css) to prevent card height compression and text overlap when the flex-column list grows.


Task 10: Connect engine to AppState — DONE

Files: src/main.rs

let control_runtime = Arc::new(control::runtime::ControlRuntimeStore::new());
// ... build AppState ...
control::engine::start(state.clone(), control_runtime);