272 lines
11 KiB
Markdown
272 lines
11 KiB
Markdown
# Control Engine Implementation Plan
|
|
|
|
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
|
|
**Goal:** Implement the automated control engine for coal feeder / distributor units, including state machine, fault/comm protection, runtime API and frontend control panels.
|
|
|
|
**Architecture:** The engine spawns one async task per enabled unit (supervised by a 10s scanner). Each task drives the unit's state machine using `tokio::time::sleep_until` for phase timing and `tokio::sync::Notify` for instant wake-up when external state changes (auto enable/disable, fault ack). A 500ms fault-poll ticker runs inside each task's `wait_phase` helper so fault/comm status is still checked promptly during long phases. State is kept in `ControlRuntimeStore` (in-memory, never persisted). Frontend receives real-time updates via `WsMessage::UnitRuntimeChanged` — pushed **only on state transitions**, not every tick.
|
|
|
|
**Tech Stack:** Rust/Axum backend, sqlx/PostgreSQL, tokio async, vanilla JS ES modules frontend.
|
|
|
|
---
|
|
|
|
## File Map
|
|
|
|
| File | Action | Responsibility |
|
|
|------|--------|---------------|
|
|
| `src/control/runtime.rs` | ✅ Done | `UnitRuntime` struct + `ControlRuntimeStore` with `Notify` per unit |
|
|
| `src/control/command.rs` | ✅ Done | Shared `send_pulse_command()` helper |
|
|
| `src/control/engine.rs` | ✅ Done | Supervisor + per-unit async tasks + `wait_phase` |
|
|
| `src/control/validator.rs` | ✅ Done | Block manual commands when unit is fault/comm locked |
|
|
| `src/control/mod.rs` | ✅ Done | Exposes `command`, `engine`, `runtime`, `validator` |
|
|
| `src/event.rs` | ✅ Done | 7 `AppEvent` variants; `UnitStateChanged` fires but is **not** persisted to DB |
|
|
| `src/websocket.rs` | ✅ Done | `WsMessage::UnitRuntimeChanged` |
|
|
| `src/service/control.rs` | ✅ Done | `get_all_enabled_units`, `get_equipment_by_unit_id` |
|
|
| `src/handler/control.rs` | ✅ Done | `start_auto`, `stop_auto`, `batch_start_auto`, `batch_stop_auto`, `ack_fault`, `get_unit_runtime`; calls `notify_unit` after every state change |
|
|
| `src/main.rs` | ✅ Done | Routes for above endpoints |
|
|
| `web/js/state.js` | ✅ Done | `runtimes: new Map()` |
|
|
| `web/js/units.js` | ✅ Done | Runtime state badge, Auto Start/Stop, Ack Fault; shows `display_acc_sec` |
|
|
| `web/js/ops.js` | ✅ Done | Ops panel unit cards show runtime badge + `display_acc_sec` |
|
|
| `web/js/app.js` | ✅ Done | Handles `UnitRuntimeChanged` WS message |
|
|
| `web/styles.css` | ✅ Done | `.event-card { flex-shrink: 0 }` prevents text overlap under flex column |
|
|
|
|
---
|
|
|
|
## Current UnitRuntime Shape
|
|
|
|
```rust
|
|
// src/control/runtime.rs
|
|
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
|
|
pub struct UnitRuntime {
|
|
pub unit_id: Uuid,
|
|
pub state: UnitRuntimeState,
|
|
pub auto_enabled: bool,
|
|
pub accumulated_run_sec: i64, // internal accumulator (ms); do NOT display directly
|
|
pub display_acc_sec: i64, // snapshot at state-transition; use this for display
|
|
pub fault_locked: bool,
|
|
pub flt_active: bool,
|
|
pub comm_locked: bool,
|
|
pub manual_ack_required: bool,
|
|
}
|
|
// NOTE: elapsed-time fields (current_run_elapsed_sec, current_stop_elapsed_sec,
|
|
// distributor_run_elapsed_sec, last_tick_at) were removed in the event-driven
|
|
// refactor. Timing is now managed entirely by tokio::time::sleep_until inside
|
|
// the per-unit task. Do not re-add them.
|
|
```
|
|
|
|
`ControlRuntimeStore` adds:
|
|
|
|
```rust
|
|
notifiers: Arc<RwLock<HashMap<Uuid, Arc<Notify>>>>,
|
|
|
|
// Methods:
|
|
pub async fn get_or_create_notify(&self, unit_id: Uuid) -> Arc<Notify>
|
|
pub async fn notify_unit(&self, unit_id: Uuid) // call from handlers after state changes
|
|
```
|
|
|
|
---
|
|
|
|
## Engine Architecture (event-driven, 2026-03-26)
|
|
|
|
```
|
|
start()
|
|
└─ supervise() — interval 10s, spawns unit_task per enabled unit
|
|
|
|
unit_task(unit_id)
|
|
├─ load_equipment_maps — once at task start (cached for task lifetime)
|
|
├─ fault_tick — interval 500ms, used inside wait_phase
|
|
└─ loop:
|
|
├─ reload unit config (check still enabled)
|
|
├─ check_fault_comm → push WS if changed
|
|
├─ if !auto || fault || comm → select!(fault_tick | notify), continue
|
|
└─ match state:
|
|
Stopped → wait_phase(stop_time_sec) → start feeder → state=Running → push WS
|
|
Running → wait_phase(run_time_sec) → stop feeder → acc += run_time_sec
|
|
→ if acc >= acc_time_sec: start distributor, state=DistributorRunning
|
|
→ else: state=Stopped → push WS
|
|
DistributorRunning → wait_phase(bl_time_sec) → stop distributor → acc=0 → state=Stopped → push WS
|
|
FaultLocked|CommLocked → select!(fault_tick | notify)
|
|
|
|
wait_phase(secs):
|
|
deadline = now + secs
|
|
loop:
|
|
select! { sleep_until(deadline) => return true
|
|
fault_tick.tick() => re-check fault/comm; if interrupted return false
|
|
notify.notified() => re-check fault/comm; if interrupted return false }
|
|
```
|
|
|
|
**Key invariants:**
|
|
- `accumulated_run_sec` is updated by **exactly** `run_time_sec * 1000` per completed cycle (no delta drift).
|
|
- `display_acc_sec` is a snapshot copied from `accumulated_run_sec` only at Running→Stopped or Running→DistributorRunning transitions. Frontend always reads `display_acc_sec`.
|
|
- WS is pushed **only** when something changes. No periodic push.
|
|
- `unit.state_changed` events are fired (for logging) but **not** written to the DB event table (too frequent).
|
|
|
|
---
|
|
|
|
## Task 1: Extend UnitRuntime — ✅ DONE
|
|
|
|
**Files:** `src/control/runtime.rs`
|
|
|
|
Fields as shown in "Current UnitRuntime Shape" above. `ControlRuntimeStore` includes the `notifiers` map with `get_or_create_notify` and `notify_unit` methods.
|
|
|
|
---
|
|
|
|
## Task 2: Create shared pulse-command helper — ✅ DONE
|
|
|
|
**Files:** `src/control/command.rs`, `src/control/mod.rs`, `src/handler/control.rs`
|
|
|
|
`send_pulse_command(connection_manager, point_id, value_type, pulse_ms)` writes high→delay→low.
|
|
`simulate_run_feedback(state, eq_id, running)` writes a fake run-feedback value in simulate mode.
|
|
|
|
---
|
|
|
|
## Task 3: Add runtime-state checks to validator.rs — ✅ DONE
|
|
|
|
**Files:** `src/control/validator.rs`
|
|
|
|
After existing REM/FLT/quality checks in `validate_manual_control`:
|
|
|
|
```rust
|
|
if let Some(unit_id) = equipment.unit_id {
|
|
if let Some(runtime) = state.control_runtime.get(unit_id).await {
|
|
if runtime.auto_enabled {
|
|
return Err(ApiErr::Forbidden("Auto control is active; disable auto first", ...));
|
|
}
|
|
if runtime.comm_locked {
|
|
return Err(ApiErr::Forbidden("Unit communication is locked", ...));
|
|
}
|
|
if runtime.fault_locked {
|
|
return Err(ApiErr::Forbidden("Unit is fault locked", ...));
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## Task 4: Extend AppEvent with business events — ✅ DONE
|
|
|
|
**Files:** `src/event.rs`
|
|
|
|
7 variants added:
|
|
|
|
```rust
|
|
AutoControlStarted { unit_id: Uuid },
|
|
AutoControlStopped { unit_id: Uuid },
|
|
FaultLocked { unit_id: Uuid, equipment_id: Uuid },
|
|
FaultAcked { unit_id: Uuid },
|
|
CommLocked { unit_id: Uuid },
|
|
CommRecovered { unit_id: Uuid },
|
|
UnitStateChanged { unit_id: Uuid, from_state: String, to_state: String },
|
|
```
|
|
|
|
**`persist_event_if_needed` mapping:**
|
|
|
|
| Variant | DB? | event_type |
|
|
|---------|-----|-----------|
|
|
| `AutoControlStarted` | ✅ | `unit.auto_control_started` |
|
|
| `AutoControlStopped` | ✅ | `unit.auto_control_stopped` |
|
|
| `FaultLocked` | ✅ | `unit.fault_locked` (level: error) |
|
|
| `FaultAcked` | ✅ | `unit.fault_acked` |
|
|
| `CommLocked` | ✅ | `unit.comm_locked` (level: warn) |
|
|
| `CommRecovered` | ✅ | `unit.comm_recovered` |
|
|
| `UnitStateChanged` | ❌ | — (too frequent; fires every cycle) |
|
|
|
|
---
|
|
|
|
## Task 5: Add WsMessage::UnitRuntimeChanged — ✅ DONE
|
|
|
|
**Files:** `src/websocket.rs`
|
|
|
|
```rust
|
|
UnitRuntimeChanged(crate::control::runtime::UnitRuntime),
|
|
```
|
|
|
|
---
|
|
|
|
## Task 6: Add service helpers — ✅ DONE
|
|
|
|
**Files:** `src/service/control.rs`
|
|
|
|
```rust
|
|
pub async fn get_all_enabled_units(pool: &PgPool) -> Result<Vec<ControlUnit>, sqlx::Error>
|
|
pub async fn get_equipment_by_unit_id(pool: &PgPool, unit_id: Uuid) -> Result<Vec<Equipment>, sqlx::Error>
|
|
```
|
|
|
|
---
|
|
|
|
## Task 7: Implement control/engine.rs — ✅ DONE (event-driven, 2026-03-26)
|
|
|
|
**Files:** `src/control/engine.rs`
|
|
|
|
See "Engine Architecture" section above for the full design.
|
|
|
|
**Critical rules for future modifications:**
|
|
- Never push `WsMessage::UnitRuntimeChanged` except at state transitions or fault/comm changes.
|
|
- `wait_phase` must use `sleep_until(deadline)` not `sleep(duration)` — the deadline is fixed when the phase starts so that fault-tick re-checks don't restart the timer.
|
|
- When handling `notify.notified()` inside `wait_phase`, always re-read runtime from store (the handler may have changed `auto_enabled`).
|
|
- Equipment maps are loaded once per task invocation; if equipment config changes, the supervisor will restart the task on its next scan (≤10s delay).
|
|
|
|
---
|
|
|
|
## Task 8: New API endpoints — ✅ DONE
|
|
|
|
**Files:** `src/handler/control.rs`, `src/main.rs`
|
|
|
|
| Method | Path | Handler |
|
|
|--------|------|---------|
|
|
| POST | `/api/unit/:id/start-auto` | `start_auto_unit` |
|
|
| POST | `/api/unit/:id/stop-auto` | `stop_auto_unit` |
|
|
| POST | `/api/unit/:id/ack-fault` | `ack_fault_unit` |
|
|
| POST | `/api/unit/batch-start-auto` | `batch_start_auto` |
|
|
| POST | `/api/unit/batch-stop-auto` | `batch_stop_auto` |
|
|
| GET | `/api/unit/:id/runtime` | `get_unit_runtime` |
|
|
|
|
**Notify contract:** every handler that modifies `auto_enabled` or `fault_locked` MUST call `state.control_runtime.notify_unit(unit_id).await` after upserting the runtime. This wakes the sleeping unit task immediately.
|
|
|
|
```rust
|
|
// Pattern to follow in every auto/fault handler:
|
|
state.control_runtime.upsert(runtime).await;
|
|
state.control_runtime.notify_unit(unit_id).await; // ← must not be omitted
|
|
let _ = state.event_manager.send(AppEvent::...);
|
|
```
|
|
|
|
---
|
|
|
|
## Task 9: Frontend runtime integration — ✅ DONE
|
|
|
|
**Files:** `web/js/state.js`, `web/js/units.js`, `web/js/ops.js`, `web/js/app.js`
|
|
|
|
**WS handler in app.js:**
|
|
```js
|
|
case "UnitRuntimeChanged":
|
|
state.runtimes.set(payload.data.unit_id, payload.data);
|
|
renderUnits(); // re-renders unit cards with new badge/buttons
|
|
renderOpsUnits();
|
|
break;
|
|
```
|
|
|
|
**Display rule:** Always use `runtime.display_acc_sec` for Acc display, never `runtime.accumulated_run_sec`.
|
|
|
|
```js
|
|
// ✅ Correct
|
|
`Acc ${Math.floor(runtime.display_acc_sec / 1000)}s`
|
|
|
|
// ❌ Wrong — shows mid-cycle jitter values
|
|
`Acc ${Math.floor(runtime.accumulated_run_sec / 1000)}s`
|
|
```
|
|
|
|
**Event list CSS:** `.event-card` must have `flex-shrink: 0` (in `web/styles.css`) to prevent card height compression and text overlap when the flex-column list grows.
|
|
|
|
---
|
|
|
|
## Task 10: Connect engine to AppState — ✅ DONE
|
|
|
|
**Files:** `src/main.rs`
|
|
|
|
```rust
|
|
let control_runtime = Arc::new(control::runtime::ControlRuntimeStore::new());
|
|
// ... build AppState ...
|
|
control::engine::start(state.clone(), control_runtime);
|
|
```
|