`/v1/tenant/config` — Per-tenant configuration

Cut 2.13 (ADR 0013) — three-level cascading configuration store. Lets each tenant tune UX/operational defaults (compaction, image-gen, etc.) without operator intervention, while keeping billing, security, and permissions under operator control via tokens.yaml.

Lookup hierarchy (highest wins)

Level	Source	Edit workflow	Audience
L3	per-tenant SQLite (`tenant_config` table)	`PUT /v1/tenant/config` (runtime)	tenant admin, self-service
L2	per-tenant YAML (`tokens.yaml::tenants[].defaults`)	operator-edit + `dgx-rust-api` restart	stack operator
L1	process default (ENV → hardcoded fallback)	code/env change + restart	engineer

The cascade is computed per-key on every read. A GET /v1/tenant/config response shows the effective value AND the level it came from (source: "process" | "yaml" | "db").

Security classification

Per ADR 0013, every setting belongs to exactly one of three groups. The group decides which levels exist for the setting and whether it can be PUT via the API.

Group A — code-fixed (no override)

Stack-wide allowlists. Changing requires a code-release.

ALLOWED_MODELS, ALLOWED_EMBEDDING_MODELS
MAX_BYTES_PER_BUCKET (memory)

Group B — yaml-only (L1 + L2)

Operator-curated. Tenant cannot self-tune via the API.

Key	Reason
`cost_markup_factor`	Billing — tenant must not zero its own bill
`models_allowlist`	Security boundary — tenant must not self-grant premium models
`models_blacklist`	Mirror of allowlist; mutually exclusive
`audit_user_id_pseudonymize`	Compliance — tenant must not opt out of DSGVO-mode self-service
Per-token `secrets`, `extra_scopes`, `deny_scopes`, `constraints.*`, `rate_limit_bypass`	Permissions / credentials, ADR 0003
`c1_require_user_id_binding`	Obsolete since Cut 2.23c (ADR 0016). Strict header-only authority is now stack-wide. Setting is accepted in YAML for backward-compat but is a no-op.

PUT on a Group-B key returns 400 tenant_config_key_readonly with the explicit "edit tokens.yaml + restart" remediation.

Group C — full cascade (L1 + L2 + L3)

UX/operational. Tenant can PUT via the API. Hard-caps prevent abuse.

Key	Type	L3 cap	Notes
`compact_strategy`	`auto`\|`manual`\|`off`	—	session-default for new sessions
`compact_keep_last_n`	int 0–200	200	how many tail messages survive compaction
`compact_observation_mask`	bool	—	tells summary-model to drop tool-noise
`compact_summary_model`	model alias	must be in tenant's effective allowlist	local default = `llama-4-scout` (Datenschutz). Cut 2.46: steuert jetzt AUCH die `/c1` Chat-Summary (Titel/Summary), nicht mehr nur die `/a1`-Session-Compaction. Env `SUMMARY_MODEL` bleibt globaler Override.
`image_gen_default_model`	model alias	must be in tenant's effective allowlist	default when tool called without `model:`
`image_default_ttl_hours`	int ≥ 1	≤ effective `image_max_ttl_hours`	per-tenant default TTL on generated images
`image_max_ttl_hours`	int 1–720	720 (= 30 d)	hard cap on per-request override
`image_gen_rate_per_hour`	int 1–200	200 at API/DB level	operator can lift to 1 000 via L2 yaml

Endpoints

All three endpoints are scope-gated:

GET /v1/tenant/config requires tenant_config:read
PUT /v1/tenant/config requires tenant_config:write (which implies :read)
DELETE /v1/tenant/config/{key} requires tenant_config:write

The tenant_admin role grants tenant_config:write by default.

`GET /v1/tenant/config`

Returns every setting (Group B + C) with effective value, source, and readonly-flag.

TOKEN="$(grep '^RUST_API_BEARER=' /home/dietmar/dgx-llm/.env | cut -d= -f2)"
curl -s "$HOST/v1/tenant/config" -H "Authorization: Bearer $TOKEN" | jq

{
  "tenant_id": "hC7EOMyDFo2BctV7ZQBjpe",
  "effective": {
    "compact_strategy":         {"value": "auto",          "source": "process", "readonly": false},
    "compact_keep_last_n":      {"value": 10,              "source": "process", "readonly": false},
    "compact_observation_mask": {"value": true,            "source": "process", "readonly": false},
    "compact_summary_model":    {"value": "llama-4-scout", "source": "process", "readonly": false},
    "image_gen_default_model":  {"value": "nano-banana",   "source": "process", "readonly": false},
    "image_default_ttl_hours":  {"value": 12,              "source": "process", "readonly": false},
    "image_max_ttl_hours":      {"value": 168,             "source": "process", "readonly": false},
    "image_gen_rate_per_hour":  {"value": 20,              "source": "process", "readonly": false},
    "cost_markup_factor":       {"value": 1.5,             "source": "process", "readonly": true},
    "models_allowlist":         {"value": null,            "source": "process", "readonly": true},
    "models_blacklist":         {"value": null,            "source": "process", "readonly": true}
  },
  "writable_keys": [
    "compact_strategy", "compact_keep_last_n", "compact_observation_mask",
    "compact_summary_model", "image_gen_default_model",
    "image_default_ttl_hours", "image_max_ttl_hours", "image_gen_rate_per_hour"
  ],
  "readonly_keys": ["cost_markup_factor", "models_allowlist", "models_blacklist"]
}

`PUT /v1/tenant/config`

Sets one or more L3 (DB) overrides atomically. Body is a flat JSON object — keys must be in writable_keys, values must pass per-key validation.

curl -s -X PUT "$HOST/v1/tenant/config" \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "compact_keep_last_n": 25,
    "image_gen_rate_per_hour": 50,
    "compact_summary_model": "llama-4-scout"
  }' | jq

{ "applied": ["compact_keep_last_n", "compact_summary_model", "image_gen_rate_per_hour"] }

Failures are atomic per-request — the first invalid key 400's and no keys are applied.

Code	Status	Cause
`tenant_config_key_readonly`	400	Group-B key — edit `tokens.yaml` + restart
`tenant_config_invalid_value`	400	wrong type, out of range, unknown enum, model not in tenant's effective allowlist
`forbidden`	403	caller lacks `tenant_config:write` scope

`DELETE /v1/tenant/config/{key}`

Clears one L3 override. Idempotent — removed: false on 200 if the key wasn't set. Group-B keys return 400 tenant_config_key_readonly.

curl -s -X DELETE "$HOST/v1/tenant/config/image_gen_rate_per_hour" \
  -H "Authorization: Bearer $TOKEN" | jq
# → { "key": "image_gen_rate_per_hour", "removed": true }

The next GET will show this key with source: "yaml" (if defined) or source: "process".

YAML format (operator-edit)

Each tenant in data/auth/tokens.yaml may carry a defaults: block — optional, fields are individually optional.

tenants:
- id: hC7EOMyDFo2BctV7ZQBjpe
  label: test
  description: internal test tenant
  defaults:
    # Group B (yaml-only)
    cost_markup_factor: 1.0                 # billing
    models_allowlist:                       # mutually exclusive with blacklist
      - claude-opus-4.7
      - llama-4-scout
    # models_blacklist:
    #   - gpt-image
    # Group C (yaml as L2, also DB-tunable at L3)
    compact_summary_model: llama-4-scout
    compact_keep_last_n: 20
    compact_strategy: auto
    image_gen_default_model: nano-banana
    image_default_ttl_hours: 24
    image_max_ttl_hours: 168
    image_gen_rate_per_hour: 100            # operator can go up to 1000 here

Validation errors (mutually-exclusive lists, out-of-range numerics, unknown enums, unknown model aliases) abort startup — fail-closed by ADR 0002.

Audit trail

Every PUT/DELETE writes a row to data/sqlite/tenants/<tenant_id>/memory.sqlite::audit_log with:

actor_token_hash, actor_user_id
action: tenant_config_set or tenant_config_unset
target_key, metadata_json (JSON-encoded value)

GET is not audit-logged (would inflate storage on Cockpit-style high-frequency polls).

Integration points

The cascade is consulted everywhere a per-tenant default is needed:

Compaction (Cut 2.12) — compact_session uses compact_summary_model from cascade
Auto-compact gate — uses compact_keep_last_n and compact_observation_mask per-session, which inherit from cascade on session-create
image_gen tool (Cut 2.10) — pulls image_gen_default_model, image_default_ttl_hours, image_max_ttl_hours, and image_gen_rate_per_hour from cascade
POST /a1/agents/<n>/sessions — body fields override cascade per-session; missing fields fall through

Migration notes

cost_markup_factor was DB-only (tenant_config row) before Cut 2.13. The cascade still reads the legacy DB row first, so existing Cockpit-tenant config (1.0) keeps working. Operator should migrate to tokens.yaml::tenants[].defaults.cost_markup_factor when convenient; the legacy DB-fallback will be removed in a future cut.
All other Cut 2.13 settings are new — no migration needed.

/v1/tenant/config — Per-tenant configuration