DGX LLM Chat Gateway

/v1/tenant/config — Per-tenant configuration

Cut 2.13 (ADR 0013) — three-level cascading configuration store. Lets each tenant tune UX/operational defaults (compaction, image-gen, etc.) without operator intervention, while keeping billing, security, and permissions under operator control via tokens.yaml.

Lookup hierarchy (highest wins)

LevelSourceEdit workflowAudience
L3per-tenant SQLite (tenant_config table)PUT /v1/tenant/config (runtime)tenant admin, self-service
L2per-tenant YAML (tokens.yaml::tenants[].defaults)operator-edit + dgx-rust-api restartstack operator
L1process default (ENV → hardcoded fallback)code/env change + restartengineer

The cascade is computed per-key on every read. A GET /v1/tenant/config response shows the effective value AND the level it came from (source: "process" | "yaml" | "db").

Security classification

Per ADR 0013, every setting belongs to exactly one of three groups. The group decides which levels exist for the setting and whether it can be PUT via the API.

Group A — code-fixed (no override)

Stack-wide allowlists. Changing requires a code-release.

Group B — yaml-only (L1 + L2)

Operator-curated. Tenant cannot self-tune via the API.

KeyReason
cost_markup_factorBilling — tenant must not zero its own bill
models_allowlistSecurity boundary — tenant must not self-grant premium models
models_blacklistMirror of allowlist; mutually exclusive
audit_user_id_pseudonymizeCompliance — tenant must not opt out of DSGVO-mode self-service
Per-token secrets, extra_scopes, deny_scopes, constraints.*, rate_limit_bypassPermissions / credentials, ADR 0003
c1_require_user_id_bindingObsolete since Cut 2.23c (ADR 0016). Strict header-only authority is now stack-wide. Setting is accepted in YAML for backward-compat but is a no-op.

PUT on a Group-B key returns 400 tenant_config_key_readonly with the explicit "edit tokens.yaml + restart" remediation.

Group C — full cascade (L1 + L2 + L3)

UX/operational. Tenant can PUT via the API. Hard-caps prevent abuse.

KeyTypeL3 capNotes
compact_strategyauto|manual|offsession-default for new sessions
compact_keep_last_nint 0–200200how many tail messages survive compaction
compact_observation_maskbooltells summary-model to drop tool-noise
compact_summary_modelmodel aliasmust be in tenant's effective allowlistlocal default = llama-4-scout (Datenschutz). Cut 2.46: steuert jetzt AUCH die /c1 Chat-Summary (Titel/Summary), nicht mehr nur die /a1-Session-Compaction. Env SUMMARY_MODEL bleibt globaler Override.
image_gen_default_modelmodel aliasmust be in tenant's effective allowlistdefault when tool called without model:
image_default_ttl_hoursint ≥ 1≤ effective image_max_ttl_hoursper-tenant default TTL on generated images
image_max_ttl_hoursint 1–720720 (= 30 d)hard cap on per-request override
image_gen_rate_per_hourint 1–200200 at API/DB leveloperator can lift to 1 000 via L2 yaml

Endpoints

All three endpoints are scope-gated:

The tenant_admin role grants tenant_config:write by default.

GET /v1/tenant/config

Returns every setting (Group B + C) with effective value, source, and readonly-flag.

TOKEN="$(grep '^RUST_API_BEARER=' /home/dietmar/dgx-llm/.env | cut -d= -f2)"
curl -s "$HOST/v1/tenant/config" -H "Authorization: Bearer $TOKEN" | jq
{
  "tenant_id": "hC7EOMyDFo2BctV7ZQBjpe",
  "effective": {
    "compact_strategy":         {"value": "auto",          "source": "process", "readonly": false},
    "compact_keep_last_n":      {"value": 10,              "source": "process", "readonly": false},
    "compact_observation_mask": {"value": true,            "source": "process", "readonly": false},
    "compact_summary_model":    {"value": "llama-4-scout", "source": "process", "readonly": false},
    "image_gen_default_model":  {"value": "nano-banana",   "source": "process", "readonly": false},
    "image_default_ttl_hours":  {"value": 12,              "source": "process", "readonly": false},
    "image_max_ttl_hours":      {"value": 168,             "source": "process", "readonly": false},
    "image_gen_rate_per_hour":  {"value": 20,              "source": "process", "readonly": false},
    "cost_markup_factor":       {"value": 1.5,             "source": "process", "readonly": true},
    "models_allowlist":         {"value": null,            "source": "process", "readonly": true},
    "models_blacklist":         {"value": null,            "source": "process", "readonly": true}
  },
  "writable_keys": [
    "compact_strategy", "compact_keep_last_n", "compact_observation_mask",
    "compact_summary_model", "image_gen_default_model",
    "image_default_ttl_hours", "image_max_ttl_hours", "image_gen_rate_per_hour"
  ],
  "readonly_keys": ["cost_markup_factor", "models_allowlist", "models_blacklist"]
}

PUT /v1/tenant/config

Sets one or more L3 (DB) overrides atomically. Body is a flat JSON object — keys must be in writable_keys, values must pass per-key validation.

curl -s -X PUT "$HOST/v1/tenant/config" \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "compact_keep_last_n": 25,
    "image_gen_rate_per_hour": 50,
    "compact_summary_model": "llama-4-scout"
  }' | jq
{ "applied": ["compact_keep_last_n", "compact_summary_model", "image_gen_rate_per_hour"] }

Failures are atomic per-request — the first invalid key 400's and no keys are applied.

CodeStatusCause
tenant_config_key_readonly400Group-B key — edit tokens.yaml + restart
tenant_config_invalid_value400wrong type, out of range, unknown enum, model not in tenant's effective allowlist
forbidden403caller lacks tenant_config:write scope

DELETE /v1/tenant/config/{key}

Clears one L3 override. Idempotent — removed: false on 200 if the key wasn't set. Group-B keys return 400 tenant_config_key_readonly.

curl -s -X DELETE "$HOST/v1/tenant/config/image_gen_rate_per_hour" \
  -H "Authorization: Bearer $TOKEN" | jq
# → { "key": "image_gen_rate_per_hour", "removed": true }

The next GET will show this key with source: "yaml" (if defined) or source: "process".

YAML format (operator-edit)

Each tenant in data/auth/tokens.yaml may carry a defaults: block — optional, fields are individually optional.

tenants:
- id: hC7EOMyDFo2BctV7ZQBjpe
  label: test
  description: internal test tenant
  defaults:
    # Group B (yaml-only)
    cost_markup_factor: 1.0                 # billing
    models_allowlist:                       # mutually exclusive with blacklist
      - claude-opus-4.7
      - llama-4-scout
    # models_blacklist:
    #   - gpt-image
    # Group C (yaml as L2, also DB-tunable at L3)
    compact_summary_model: llama-4-scout
    compact_keep_last_n: 20
    compact_strategy: auto
    image_gen_default_model: nano-banana
    image_default_ttl_hours: 24
    image_max_ttl_hours: 168
    image_gen_rate_per_hour: 100            # operator can go up to 1000 here

Validation errors (mutually-exclusive lists, out-of-range numerics, unknown enums, unknown model aliases) abort startup — fail-closed by ADR 0002.

Audit trail

Every PUT/DELETE writes a row to data/sqlite/tenants/<tenant_id>/memory.sqlite::audit_log with:

GET is not audit-logged (would inflate storage on Cockpit-style high-frequency polls).

Integration points

The cascade is consulted everywhere a per-tenant default is needed:

Migration notes

See also