# throughput.mor.org — Data Sources & Calculation Reference

This document defines the authoritative data sources and formulas used by
`throughput.html` (the Morpheus network throughput / session stats page).
It exists so that any AI assistant editing the page, or humans sanity-
checking the indexer, knows exactly what is pulled from where and never
fabricates numbers.

Companion to `CALC_DATA_SOURCES.md`. Implementation lives at
`./.terragrunt/throughput-indexer/` (inside `.terragrunt/` so Terragrunt
copies it along with the Terraform source; mirrors `05-active_models`).

---

## 1. Inference Contract (LumerinDiamond) on Base mainnet

**Address:** `0x6aBE1d282f72B474E54527D93b979A4f64d3030a` (Base chainId 8453)

The contract is EIP-2535 (Diamond). We care about the `SessionRouter`,
`SessionStorage`, `BidStorage`, `StatsStorage`, and `ModelStorage` facets
(source: `Morpheus-Lumerin-Node/smart-contracts/contracts/diamond/`).

### Events scanned
Both event topics are indexed by user, sessionId, and providerId, so
filtering is cheap.

| Event | Signature | Used for |
|---|---|---|
| `SessionOpened` | `(address indexed user, bytes32 indexed sessionId, address indexed providerId)` | Count of opens; input for stuck-session detection |
| `SessionClosed` | `(address indexed user, bytes32 indexed sessionId, address indexed providerId)` | Primary throughput source — triggers enrichment |

Event topic hashes:
- `SessionOpened` → `keccak256("SessionOpened(address,bytes32,address)")`
- `SessionClosed` → `keccak256("SessionClosed(address,bytes32,address)")`

### Read calls used per closed session
- `getSession(bytes32) → (user, bidId, stake, closeoutReceipt, closeoutType, providerWithdrawnAmount, openedAt, endsAt, closedAt, isActive, isDirectPaymentFromUser)`
- `getBid(bytes32) → (provider, modelId, pricePerSecond, nonce, createdAt, deletedAt)`
- `getModel(bytes32) → (ipfsCID, fee, stake, owner, name, tags, createdAt, isDeleted)`

### Closeout receipt decoding
`session.closeoutReceipt` is `bytes` produced and ECDSA-signed by the provider.
The c-node encodes **seven** fields (`morrpcmessage/abi.go` `sessionReportAbi`):

```solidity
abi.decode(receipt, (bytes32, uint256, uint128, uint32, uint32, uint32, uint32))
// sessionId, chainId, timestamp, tpsScaled1000, ttftMs, inputTokens, outputTokens
```

`SessionRouter._extractProviderReceipt` only reads the first five on-chain, but the
full bytes are stored in `closeoutReceipt`. The indexer decodes all seven when
present.

- `tpsScaled1000` — **mean** of per-request instantaneous TPS values
  (`outputTokens × 1000 / requestDuration` on the c-node), **not** total tokens ÷ session length.
- `inputTokens` / `outputTokens` — **session cumulative** counts from the c-node at close.
- **`tokens`** (rollup field) = `inputTokens + outputTokens` when the extended receipt decodes.
  Sessions with only the older 5-field receipt contribute **0 tokens** (counted under
  `sessionsMissingTokens` in rollups).

`ttftMs` is **time-to-first-token in milliseconds** (mean across requests in the session).
`tpsScaled1000` is stored for latency analytics only; it is **not** multiplied by
session duration for volume.

### `closeoutType`
`closeoutType == 1` means the close was marked as disputed inside
`_setStats` (signature mismatch or similar). Disputed closes do **not**
update on-chain running averages; we keep counting them in our own
rollups under `sessionsDisputed`, and we show a `disputedPctLast30d` KPI.

### Rolling averages already on-chain
`StatsStorage` maintains running averages per `(modelId, provider)` pair
and per `modelId` using Welford-style `LibSD.SD` structs. These are
**cumulative since launch** with no time bucketing and only count
undisputed closes. The indexer here does NOT use them — we compute our
own rollups so we get time-bucketed trends and can include disputed
closes separately.

---

## 2. RPC endpoint

**Required:** a private Base mainnet RPC (Alchemy, QuickNode, Infura,
self-hosted). Set in `throughput-indexer/config.env`:

```
BASE_RPC_URL=https://base-mainnet.g.alchemy.com/v2/YOUR_KEY
```

The public `https://mainnet.base.org` will not survive a backfill
(aggressive rate limits, ~500 blocks per `eth_getLogs`).

Safety head lag: the indexer stops scanning at `head - 20` blocks
(~40 s on Base) to avoid reorg surprises. Base finality is fast and
reorgs of that depth are essentially zero in practice.

---

## 3. Token volume

**Per closed session (chart total `tokens`):**
```
tokens = receipt totals           when inputTokens + outputTokens on chain
       = min(tps, TPS_CLAMP) × duration   otherwise (Dec 2025 – Jan 2026 only)
```

Feb 2026 onward closes carry cumulative token totals in the seven-field receipt.
Dec/Jan closes only stored mean TPS + TTFT (five-field receipt); rollups use the
clamped TPS×duration fallback so those months are not blank.

Published rollups also expose the split:
- `tokensReceipt` — sum of on-chain input+output totals
- `tokensLegacyEst` — sum of five-field fallback estimates
- `sessionsLegacyEst` — closes counted via fallback

---

## 4. Privacy model

Output files published to the public S3 bucket contain **aggregates only**:
- Counts of sessions, unique users, unique providers, unique models
- Summed tokens, session-seconds, staked MOR, paid-out MOR
- Average TPS and TTFT per bucket
- Per-model and per-provider totals (addresses are public on-chain data)

Output files do **not** contain:
- Per-user rows
- Per-session rows keyed by address
- Any wallet address outside the `provider` field (which is already
  a public, on-chain-registered identity)

Raw NDJSON per-session rows live **only** in the operator's
`throughput-indexer/local-raw/` directory, which is gitignored and
never synced to S3.

> **Note:** The API Gateway opens sessions on behalf of many users from
> a shared consumer-node wallet. In aggregate that wallet's activity
> will appear against the gateway's known address — that is intentional
> and not a privacy leak beyond what is already visible on-chain.

---

## 5. Output layout (S3)

Bucket: **`prd-tech-website`** (same bucket as `tech.mor.org`), prefix
**`api/throughput/`**. Served via the existing CloudFront distribution,
so the HTML page fetches from the same origin (no CORS needed).

The Lambda and any local CLI runs share this prefix — it is the single
source of truth for cursor, raw rows, and published rollups.

| Key | Content | Writer | Refresh |
|---|---|---|---|
| `state/cursor.json` | `{ "last_block": N, "last_run": ts, … }` (forward-only) | Lambda + local `incremental`/`backfill` | Every run |
| `raw/closes/YYYY-MM.ndjson.gz` | Enriched per-session rows, gzipped, partitioned by month of `closedAt` | Lambda + local backfill | Append-on-new-rows |
| `raw/opens/YYYY-MM.ndjson.gz` | `SessionOpened` rows, partitioned by block timestamp month | Same | Append-on-new-rows |
| `summary.json` | All-time / 24h / 7d / 30d KPI snapshot | Any `rollup` run | Every rollup |
| `rollup/hourly.json` | Last 14 d, hourly buckets | Rollup | Every rollup |
| `rollup/daily.json` | Last 365 d, daily buckets | Rollup | Every rollup |
| `rollup/weekly.json` | All time, weekly (ISO Monday) | Rollup | Every rollup |
| `rollup/monthly.json` | All time, monthly | Rollup | Every rollup |
| `rollup/by-model-daily.json` | Last 30 d per-model daily + totals | Rollup | Every rollup |
| `rollup/by-provider-daily.json` | Last 30 d per-provider daily + totals | Rollup | Every rollup |
| `stuck-sessions.json` | Aggregate count of opened-but-expired sessions + top providers + MOR locked | `rollup --stuck` / Lambda default | Every rollup |

Cache: files are written with `Cache-Control: public, max-age=60` and the
indexer can optionally invalidate `/api/throughput/*` on CloudFront after
upload (set `CLOUDFRONT_DISTRIBUTION_ID` in `config.env`).

Per-bucket shape:
```json
{
  "bucket": "2026-04-21",
  "sessionsClosed": 412,
  "sessionsDisputed": 3,
  "sessionsClosedEarly": 77,
  "tokens": 9421234.5,
  "sessionsWithTokens": 1204,
  "sessionsMissingTokens": 3,
  "sessionSeconds": 124530,
  "avgTps": 72.1,
  "avgTpsClamped": 71.9,
  "avgTtftMs": 412,
  "uniqueUsers": 183,
  "uniqueProviders": 5,
  "uniqueModels": 12,
  "morStake": 1234.5,
  "morPaid": 987.3
}
```

---

## 6. Stuck sessions (expired, un-closed)

A session becomes "stuck" when:
1. `SessionOpened` fired,
2. `SessionClosed` never fired for the same `sessionId`,
3. On-chain `getSession(sid).closedAt == 0`, and
4. `getSession(sid).endsAt < now`.

The indexer's `rollup --stuck` pass performs step 3-4 (cheap `eth_call`s)
against every observed open that isn't in `closes.ndjson`. Output is
aggregate-only: count, total MOR locked, and top-N providers ranked by
MOR locked.

A persistently non-zero `stuckExpired` number is the leading indicator
of funding / gas / node-offline problems (see `session.html` narrative).

---

## 7. Indexer operational model

Two runners share the exact same codebase and S3 state:

### AWS Lambda — steady-state upkeep

Provisioned by `environments/07-calc-mor-org/.terragrunt/04_indexer_lambda.tf`.
EventBridge invokes `lambda_handler.lambda_handler` every
`schedule_minutes` (default 15). Each run is bounded by
`MAX_BLOCKS_PER_RUN` (default 4000 ≈ ~2.2 h of Base blocks) so it
comfortably fits within the Lambda timeout. The private Base RPC URL is
stored in a dedicated Secrets Manager secret and fetched at cold start.

Operational commands:

```bash
# Seed the RPC secret once after first apply
aws secretsmanager put-secret-value \
  --secret-id prd-morpheus-throughput-indexer-rpc \
  --secret-string "https://base-mainnet.g.alchemy.com/v2/KEY" \
  --profile mor-org-prd --region us-east-2

# Force an immediate run (skip schedule). Each scheduled run also enriches up to
# ENRICH_CLOSES_PER_RUN stale rows in the latest two monthly partitions.
aws lambda invoke \
  --function-name prd-morpheus-throughput-indexer \
  --payload '{"force_rollup":true}' \
  --cli-binary-format raw-in-base64-out \
  --profile mor-org-prd --region us-east-2 /tmp/resp.json

# Tail logs
aws logs tail /aws/lambda/prd-morpheus-throughput-indexer \
  --follow --profile mor-org-prd --region us-east-2
```

### Operator laptop — historical backfill

```bash
# Smoke test (read-only, no S3 writes). REQUIRED before first backfill.
python spot_check.py --recent 10

# One-time historical backfill. Idempotent, resumable in segments.
python indexer.py backfill --start-block 9000000

# Or in segments across coffee breaks:
python indexer.py backfill --start-block  9000000 --end-block 12000000
python indexer.py backfill --start-block 12000001 --end-block 15000000
# …etc

# Recompute public rollups after any backfill
python indexer.py rollup --stuck

# One-time (or periodic) normalize pre-May-2026 schema rows in S3 closes partitions:
# adds receiptFormat, on-chain token totals, strips tokensEst/tps estimates.
python indexer.py enrich-closes --publish
# Smoke test first:
python indexer.py enrich-closes --partition 2026-05 --dry-run --limit 50
```

Backfill only advances the cursor if its `end-block` is beyond what the
Lambda has already seen, so historical fills don't regress steady-state.
The Lambda can safely continue running while a backfill progresses —
monthly partitions are de-duplicated by `sessionId` on write.

---

## 8. Sanity-check heuristics (in `spot_check.py`)

When inspecting sample sessions, these flags are surfaced for eyeballing:

| Condition | Meaning |
|---|---|
| `duration < 300 s` | Contract enforces 5 min min, so this shouldn't happen — data or decoding bug |
| `tps > 500` | Almost certainly an outlier; the clamp will ignore it in rollups |
| `tps < 0.5` with `duration > 300 s` | Provider under-reported or the session did very little work |
| Closed session with no `closeoutReceipt` | Unexpected — flag for investigation |

Use `--recent 10` to quickly review ten real closes across providers
before committing to a full backfill.

---

## 9. What paths are covered

| Usage path | How it lands on-chain | Covered by this indexer |
|---|---|---|
| Consumer node (Lumerin proxy-router) staking MOR directly | Opens a SessionRouter session signed by the user's wallet | Yes |
| OpenClaw / Morpheus Skill (MetaMask-driven) | Delegates to a wallet which opens the session | Yes |
| Node Neo (consumer-node UX wrapper) | Same as consumer node | Yes |
| API Gateway — crypto-staked or pooled users | Gateway's service wallet opens sessions on the contract (often aggregated across many end users) | Yes, aggregated under that wallet |
| API Gateway — credit/Stripe-billed end users | Same routed path: C-Node consumer wallet opens/closes on-chain sessions; credits/Stripe are **billing** only | **Yes** — same `SessionClosed` / receipt stream when sessions close normally. Do not treat as "off-chain-only users." |

---

*This file is maintained to match the `throughput.html` page and the code in
`throughput-indexer/`. When in doubt, defer to the smart-contract source at
`Morpheus-Lumerin-Node/smart-contracts/contracts/diamond/` and the indexer.*
