/v1/tenant/config — Per-tenant configuration
Cut 2.13 (ADR 0013) — three-level cascading configuration store. Lets each tenant tune UX/operational defaults (compaction, image-gen, etc.) without operator intervention, while keeping billing, security, and permissions under operator control via tokens.yaml.
Lookup hierarchy (highest wins)
| Level | Source | Edit workflow | Audience |
|---|---|---|---|
| L3 | per-tenant SQLite (tenant_config table) | PUT /v1/tenant/config (runtime) | tenant admin, self-service |
| L2 | per-tenant YAML (tokens.yaml::tenants[].defaults) | operator-edit + dgx-rust-api restart | stack operator |
| L1 | process default (ENV → hardcoded fallback) | code/env change + restart | engineer |
The cascade is computed per-key on every read. A GET /v1/tenant/config response shows the effective value AND the level it came from (source: "process" | "yaml" | "db").
Security classification
Per ADR 0013, every setting belongs to exactly one of three groups. The group decides which levels exist for the setting and whether it can be PUT via the API.
Group A — code-fixed (no override)
Stack-wide allowlists. Changing requires a code-release.
ALLOWED_MODELS,ALLOWED_EMBEDDING_MODELSMAX_BYTES_PER_BUCKET(memory)
Group B — yaml-only (L1 + L2)
Operator-curated. Tenant cannot self-tune via the API.
| Key | Reason |
|---|---|
cost_markup_factor | Billing — tenant must not zero its own bill |
models_allowlist | Security boundary — tenant must not self-grant premium models |
models_blacklist | Mirror of allowlist; mutually exclusive |
audit_user_id_pseudonymize | Compliance — tenant must not opt out of DSGVO-mode self-service |
Per-token secrets, extra_scopes, deny_scopes, constraints.*, rate_limit_bypass | Permissions / credentials, ADR 0003 |
c1_require_user_id_binding | Obsolete since Cut 2.23c (ADR 0016). Strict header-only authority is now stack-wide. Setting is accepted in YAML for backward-compat but is a no-op. |
PUT on a Group-B key returns 400 tenant_config_key_readonly with the explicit "edit tokens.yaml + restart" remediation.
Group C — full cascade (L1 + L2 + L3)
UX/operational. Tenant can PUT via the API. Hard-caps prevent abuse.
| Key | Type | L3 cap | Notes |
|---|---|---|---|
compact_strategy | auto|manual|off | — | session-default for new sessions |
compact_keep_last_n | int 0–200 | 200 | how many tail messages survive compaction |
compact_observation_mask | bool | — | tells summary-model to drop tool-noise |
compact_summary_model | model alias | must be in tenant's effective allowlist | local default = llama-4-scout (Datenschutz). Cut 2.46: steuert jetzt AUCH die /c1 Chat-Summary (Titel/Summary), nicht mehr nur die /a1-Session-Compaction. Env SUMMARY_MODEL bleibt globaler Override. |
image_gen_default_model | model alias | must be in tenant's effective allowlist | default when tool called without model: |
image_default_ttl_hours | int ≥ 1 | ≤ effective image_max_ttl_hours | per-tenant default TTL on generated images |
image_max_ttl_hours | int 1–720 | 720 (= 30 d) | hard cap on per-request override |
image_gen_rate_per_hour | int 1–200 | 200 at API/DB level | operator can lift to 1 000 via L2 yaml |
Endpoints
All three endpoints are scope-gated:
GET /v1/tenant/configrequirestenant_config:readPUT /v1/tenant/configrequirestenant_config:write(which implies:read)DELETE /v1/tenant/config/{key}requirestenant_config:write
The tenant_admin role grants tenant_config:write by default.
GET /v1/tenant/config
Returns every setting (Group B + C) with effective value, source, and readonly-flag.
TOKEN="$(grep '^RUST_API_BEARER=' /home/dietmar/dgx-llm/.env | cut -d= -f2)"
curl -s "$HOST/v1/tenant/config" -H "Authorization: Bearer $TOKEN" | jq
{
"tenant_id": "hC7EOMyDFo2BctV7ZQBjpe",
"effective": {
"compact_strategy": {"value": "auto", "source": "process", "readonly": false},
"compact_keep_last_n": {"value": 10, "source": "process", "readonly": false},
"compact_observation_mask": {"value": true, "source": "process", "readonly": false},
"compact_summary_model": {"value": "llama-4-scout", "source": "process", "readonly": false},
"image_gen_default_model": {"value": "nano-banana", "source": "process", "readonly": false},
"image_default_ttl_hours": {"value": 12, "source": "process", "readonly": false},
"image_max_ttl_hours": {"value": 168, "source": "process", "readonly": false},
"image_gen_rate_per_hour": {"value": 20, "source": "process", "readonly": false},
"cost_markup_factor": {"value": 1.5, "source": "process", "readonly": true},
"models_allowlist": {"value": null, "source": "process", "readonly": true},
"models_blacklist": {"value": null, "source": "process", "readonly": true}
},
"writable_keys": [
"compact_strategy", "compact_keep_last_n", "compact_observation_mask",
"compact_summary_model", "image_gen_default_model",
"image_default_ttl_hours", "image_max_ttl_hours", "image_gen_rate_per_hour"
],
"readonly_keys": ["cost_markup_factor", "models_allowlist", "models_blacklist"]
}
PUT /v1/tenant/config
Sets one or more L3 (DB) overrides atomically. Body is a flat JSON object — keys must be in writable_keys, values must pass per-key validation.
curl -s -X PUT "$HOST/v1/tenant/config" \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"compact_keep_last_n": 25,
"image_gen_rate_per_hour": 50,
"compact_summary_model": "llama-4-scout"
}' | jq
{ "applied": ["compact_keep_last_n", "compact_summary_model", "image_gen_rate_per_hour"] }
Failures are atomic per-request — the first invalid key 400's and no keys are applied.
| Code | Status | Cause |
|---|---|---|
tenant_config_key_readonly | 400 | Group-B key — edit tokens.yaml + restart |
tenant_config_invalid_value | 400 | wrong type, out of range, unknown enum, model not in tenant's effective allowlist |
forbidden | 403 | caller lacks tenant_config:write scope |
DELETE /v1/tenant/config/{key}
Clears one L3 override. Idempotent — removed: false on 200 if the key wasn't set. Group-B keys return 400 tenant_config_key_readonly.
curl -s -X DELETE "$HOST/v1/tenant/config/image_gen_rate_per_hour" \
-H "Authorization: Bearer $TOKEN" | jq
# → { "key": "image_gen_rate_per_hour", "removed": true }
The next GET will show this key with source: "yaml" (if defined) or source: "process".
YAML format (operator-edit)
Each tenant in data/auth/tokens.yaml may carry a defaults: block — optional, fields are individually optional.
tenants:
- id: hC7EOMyDFo2BctV7ZQBjpe
label: test
description: internal test tenant
defaults:
# Group B (yaml-only)
cost_markup_factor: 1.0 # billing
models_allowlist: # mutually exclusive with blacklist
- claude-opus-4.7
- llama-4-scout
# models_blacklist:
# - gpt-image
# Group C (yaml as L2, also DB-tunable at L3)
compact_summary_model: llama-4-scout
compact_keep_last_n: 20
compact_strategy: auto
image_gen_default_model: nano-banana
image_default_ttl_hours: 24
image_max_ttl_hours: 168
image_gen_rate_per_hour: 100 # operator can go up to 1000 here
Validation errors (mutually-exclusive lists, out-of-range numerics, unknown enums, unknown model aliases) abort startup — fail-closed by ADR 0002.
Audit trail
Every PUT/DELETE writes a row to data/sqlite/tenants/<tenant_id>/memory.sqlite::audit_log with:
actor_token_hash,actor_user_idaction:tenant_config_setortenant_config_unsettarget_key,metadata_json(JSON-encoded value)
GET is not audit-logged (would inflate storage on Cockpit-style high-frequency polls).
Integration points
The cascade is consulted everywhere a per-tenant default is needed:
- Compaction (Cut 2.12) —
compact_sessionusescompact_summary_modelfrom cascade - Auto-compact gate — uses
compact_keep_last_nandcompact_observation_maskper-session, which inherit from cascade on session-create image_gentool (Cut 2.10) — pullsimage_gen_default_model,image_default_ttl_hours,image_max_ttl_hours, andimage_gen_rate_per_hourfrom cascadePOST /a1/agents/<n>/sessions— body fields override cascade per-session; missing fields fall through
Migration notes
cost_markup_factorwas DB-only (tenant_configrow) before Cut 2.13. The cascade still reads the legacy DB row first, so existing Cockpit-tenant config (1.0) keeps working. Operator should migrate totokens.yaml::tenants[].defaults.cost_markup_factorwhen convenient; the legacy DB-fallback will be removed in a future cut.- All other Cut 2.13 settings are new — no migration needed.
See also
- ADR 0013 — Per-Tenant Configuration Hierarchy
- ADR 0010 — Cost-Pipeline V2 — origin of
cost_markup_factor - Authentication — token model, scopes, role bundles
- Agents (
/a1) — uses cascade for session compact-defaults - Server-side tools —
image_genuses cascade;memory_describe_scopeexposes scope-discovery