Observability
Gestalt logs to stdout by default. It exposes /health and /ready for
orchestrators, a Prometheus-compatible /metrics endpoint, and optional OTLP
export for traces and metrics. The audit logging page covers
the audit-specific log schema and event coverage.
Health Endpoints
The /health endpoint returns 200 whenever the process is alive. The /ready
endpoint returns 503 until all configured providers and the datastore have
finished initialization, then switches to 200.
Default Telemetry
If you omit the providers.telemetry block, Gestalt synthesizes providers.telemetry.default with built-in stdout.
providers:
telemetry:
default:
source: stdoutThat gives you structured logs without external collectors. The stdout
source also exposes a Prometheus-compatible /metrics endpoint. Traces remain
disabled unless you switch to providers.telemetry.<name>.source: otlp.
OTLP Export
Set providers.telemetry.<name>.source: otlp to export traces, metrics, and
logs over OpenTelemetry.
providers:
telemetry:
default:
source: otlp
config:
endpoint: otel-collector:4317
protocol: grpc
serviceName: gestaltd
traces:
samplingRatio: 1.0
metrics:
interval: 60s
logs:
level: infoIf you want traces and metrics in OTLP but need application logs to remain on stdout for the host platform to collect, override the log exporter.
providers:
telemetry:
default:
source: otlp
config:
endpoint: otel-collector:4317
protocol: grpc
logs:
exporter: stdout
format: json
level: infoAudit Routing
Audit logs inherit the main telemetry logger by default. If you need a
different route for compliance, analytics, or warehousing, configure
providers.audit separately.
providers:
telemetry:
default:
source: stdout
config:
format: json
audit:
default:
source: otlp
config:
endpoint: audit-collector:4317
protocol: grpc
headers:
Authorization: Bearer ${AUDIT_OTLP_TOKEN}That keeps application logs on stdout while exporting only audit records over
OTLP. You can also set providers.audit.<name>.source: stdout to keep audit
logs on stdout when telemetry is noop, or providers.audit.<name>.source: noop to disable audit
output explicitly. If you leave providers.audit.default.source: inherit,
collectors can still split the stream by filtering on log.type=audit.
Emitted Metrics
The built-in metric surface covers Prometheus exporter metadata, inbound HTTP
requests handled by gestaltd, broker operation execution, connection authentication
flows, platform authentication actions, agent lifecycle operations, authorization
provider calls, credential resolution, catalog resolution, database client operations,
and gRPC calls between gestaltd, provider processes, and host services.
Gestalt documents metric names using their OpenTelemetry instrument names. The
Prometheus scrape endpoint translates those names to Prometheus-style metric
families: dots become underscores, counters gain a _total suffix, units
become suffixes like _seconds, _milliseconds, or _bytes, and histograms
appear as _bucket, _sum, and _count series under the same family.
For example, gestaltd.operation.count is exposed as
gestaltd_operation_count_total, and gestaltd.operation.duration is exposed
as the gestaltd_operation_duration_seconds histogram family.
Gestalt follows OpenTelemetry semantic conventions for standard transport and
database metrics. HTTP server metrics follow the
HTTP metric semantic conventions ,
gRPC metrics follow the
RPC metric semantic conventions ,
and datastore metrics follow the
database client metric semantic conventions .
Gestalt-specific dimensions on those standard metrics use the gestaltd.*
namespace so they do not collide with OpenTelemetry registry attributes.
Custom Gestalt metric families such as gestaltd.operation.*,
gestaltd.auth.*, and gestaltd.discovery.* keep their established
gestalt.* attributes.
Prometheus Exporter Metadata
| Prometheus family | Type | Meaning |
|---|---|---|
target_info | Gauge | Resource attributes including service.name |
Broker Operation Metrics
These metrics are recorded around broker.invoke.
OpenTelemetry
| Metric | Type | Meaning |
|---|---|---|
gestaltd.operation.count | Counter | Operation invocations |
gestaltd.operation.error_count | Counter | Failed operation invocations |
gestaltd.operation.duration | Histogram | Operation invocation duration |
These metrics carry low-cardinality attributes: gestalt.provider (the
plugin key), gestalt.operation (the catalog operation id),
gestalt.transport (how the operation is implemented: plugin, rest, or
mcp-passthrough), and gestalt.connection_mode (the provider auth mode:
none or user). When the invocation came from a known surface, they also
carry gestalt.invocation_surface. Hosted HTTP binding invocations also carry
gestalt.http_binding, which is the configured binding name.
Operation metrics also carry gestalt.result_status and
gestalt.result_status_class. gestalt.result_status is the normalized
HTTP-like operation result status (200, 400, 502, and so on), and
gestalt.result_status_class is the matching class (1xx, 2xx, 3xx,
4xx, 5xx, or unknown). Provider-returned statuses use the provider
OperationResult status. Platform failures before provider execution use the
same status mapping as the HTTP invocation handlers. gestaltd.operation.error_count records
invocations with an invocation error or an operation result status of 400 or
higher, so provider-returned 4xx and 5xx results are counted as operation
failures.
When a request references an unknown provider or operation, gestaltd records
unknown for the missing metric attributes instead of using raw user input.
Connection Authentication Metrics
These metrics are recorded around plugin connection authentication flows including OAuth start and completion, manual connect, and token refresh.
OpenTelemetry
| Metric | Type | Meaning |
|---|---|---|
gestaltd.connection.auth.count | Counter | Connection authentication attempts |
gestaltd.connection.auth.error_count | Counter | Failed connection authentication attempts |
gestaltd.connection.auth.duration | Histogram | Connection authentication duration |
These metrics carry low-cardinality attributes: gestalt.provider (the
plugin key), gestalt.type (oauth or manual),
gestalt.action (start, complete, or refresh), and
gestalt.connection_mode (the provider authentication mode: none
or user).
When a request references an unknown plugin, gestaltd records
unknown for the provider and connection mode metric attributes instead of
using raw request input.
Platform Authentication Metrics
These metrics are recorded around the configured top-level authentication provider used to log into Gestalt and validate session tokens.
OpenTelemetry
| Metric | Type | Meaning |
|---|---|---|
gestaltd.auth.count | Counter | Platform authentication actions |
gestaltd.auth.error_count | Counter | Failed platform authentication actions |
gestaltd.auth.duration | Histogram | Platform authentication action duration |
These metrics carry low-cardinality attributes: gestalt.provider (the auth
provider name) and gestalt.action (begin_login, complete_login, or
validate_token).
Discovery Metrics
These metrics are recorded around credentialed HTTP catalog discovery, currently
GET /api/v1/integrations/{name}/operations when gestaltd resolves a
session catalog with stored or identity credentials.
OpenTelemetry
| Metric | Type | Meaning |
|---|---|---|
gestaltd.discovery.count | Counter | Credentialed discovery attempts |
gestaltd.discovery.error_count | Counter | Failed credentialed discovery attempts |
gestaltd.discovery.duration | Histogram | Credentialed discovery duration |
These metrics carry low-cardinality attributes: gestalt.provider (the plugin
key), gestalt.action (currently list_operations), and
gestalt.connection_mode (the provider auth mode: none or user).
gestaltd.discovery.error_count increments when the
credentialed session-catalog path fails, even if gestaltd falls back to a
static catalog and still serves a successful HTTP response.
Agent Metrics
Agent metrics are recorded at the gestaltd agent facade and at the configured
agent provider boundary. They use the same triplet shape as other custom
Gestalt metrics: <family>.count, <family>.error_count, and
<family>.duration.
| OpenTelemetry family | Prometheus families | Meaning |
|---|---|---|
gestaltd.agent.operation.* | gestaltd_agent_operation_count_total, gestaltd_agent_operation_error_count_total, gestaltd_agent_operation_duration_seconds | Public agent manager operations such as create_session, create_turn, list_turn_events, and resolve_interaction |
gestaltd.agent.provider.operation.* | gestaltd_agent_provider_operation_count_total, gestaltd_agent_provider_operation_error_count_total, gestaltd_agent_provider_operation_duration_seconds | Calls into the selected provider-owned agent system of record |
gestaltd.agent.tool.resolve.* | gestaltd_agent_tool_resolve_count_total, gestaltd_agent_tool_resolve_error_count_total, gestaltd_agent_tool_resolve_duration_seconds | Tool reference resolution before a turn is sent to a provider |
These metric families use low-cardinality attributes:
gestalt.agent.operationidentifies the agent operation or provider method.gestalt.agent.provideridentifies the configured agent provider on provider-bound metric families.gestalt.agent.tool.sourceidentifies tool binding mode for tool resolution:native_search.
Agent HTTP requests also form trace trees when OTLP tracing is enabled. A
typical POST /api/v1/agent/sessions/{id}/turns trace contains
agent.operation, agent.tool.resolve, catalog.operation.resolve,
and agent.provider.operation spans under the inbound HTTP server span.
Authorization Provider Metrics
Authorization provider metrics cover both direct provider interface calls and
provider-backed subject-access evaluation performed by gestaltd.
| OpenTelemetry family | Prometheus families | Meaning |
|---|---|---|
gestaltd.authorization.provider.operation.* | gestaltd_authorization_provider_operation_count_total, gestaltd_authorization_provider_operation_error_count_total, gestaltd_authorization_provider_operation_duration_seconds | Calls to the configured authorization provider interface |
gestaltd.authorization.provider.evaluate.* | gestaltd_authorization_provider_evaluate_count_total, gestaltd_authorization_provider_evaluate_error_count_total, gestaltd_authorization_provider_evaluate_duration_seconds | Batched authorization evaluations used by provider-backed subject-access checks |
These metrics carry low-cardinality attributes:
gestalt.authorization.provideridentifies the configured authorization provider.gestalt.authorization.operationidentifies the provider method forgestaltd.authorization.provider.operation.*.gestalt.authorization.scopeidentifies the resource type for provider-backed evaluation, such asintegrationoroperation.
Credential And Catalog Resolution Metrics
Credential and catalog resolution metrics make the pre-invocation and agent tool paths visible before work reaches a plugin provider.
| OpenTelemetry family | Prometheus families | Meaning |
|---|---|---|
gestaltd.credential.provider.operation.* | gestaltd_credential_provider_operation_count_total, gestaltd_credential_provider_operation_error_count_total, gestaltd_credential_provider_operation_duration_seconds | Calls to the configured external credential provider |
gestaltd.catalog.operation.resolve.* | gestaltd_catalog_operation_resolve_count_total, gestaltd_catalog_operation_resolve_error_count_total, gestaltd_catalog_operation_resolve_duration_seconds | Static or session-catalog operation resolution |
These metrics carry low-cardinality attributes:
gestalt.credential.provideridentifies the configured credential provider.gestalt.credential.operationidentifies the credential provider method.gestalt.provideridentifies the plugin provider when the path has resolved one.gestalt.operationidentifies the plugin operation for catalog operation resolution.gestalt.catalog.sourceis attached to catalog resolution spans and records whether the resolved operation came fromstatic,session, or a pre-resolvedcontextoperation.
Database Client Metrics
These metrics are recorded around every ObjectStore and Index operation on the
system IndexedDB instance. They use the OpenTelemetry
db.client.operation.duration histogram instead of a Gestalt-specific metric
family. The instrumentation is applied once at the interface level in bootstrap,
so both core services (users and API tokens) and provider-hosted traffic are
covered.
OpenTelemetry
| Metric | Type | Meaning |
|---|---|---|
db.client.operation.duration | Histogram | IndexedDB operation duration |
These metrics carry low-cardinality attributes:
db.system.nameisgestaltd.indexeddbfor the built-in IndexedDB implementation.db.namespaceidentifies the logical IndexedDB resource configured ingestaltd, such as the server’s primary IndexedDB binding.db.collection.nameidentifies the IndexedDB object store being accessed, such asusers,external_credentials, orapi_tokens.db.operation.nameidentifies the normalized ObjectStore or Index operation:get,get_key,put,add,delete,clear,get_all,get_all_keys,count,delete_range,open_cursor,open_key_cursor,index_get,index_get_key,index_get_all,index_get_all_keys,index_count,index_delete,index_open_cursor, orindex_open_key_cursor.gestaltd.provider.nameidentifies the provider namespace for provider-hosted IndexedDB traffic. Core services omit this attribute.gestaltd.indexeddb.index.nameidentifies the index name for Index operations.error.typeis attached only when an operation fails. Bounded values includecanceled,deadline_exceeded,not_found,already_exists,keys_only, andinternal.
Expected misses that return ErrNotFound are recorded as failed database
operations with error.type=not_found. Use
count:db.client.operation.duration{error.type:*}.as_count() or the matching
Prometheus histogram count series to build error counts.
Metrics And Audit Boundaries
Use metrics for low-cardinality aggregates such as volume, latency, and error rate. Use audit logs for user-facing or security-sensitive actions where you need actor, target, and outcome details.
| Surface | Metrics | Audit Logs | Notes |
|---|---|---|---|
| Provider operation invocation | Yes: gestaltd.operation.* | Yes | Every guarded HTTP and MCP invocation is audited. |
| Platform login start and completion | Yes: gestaltd.auth.* | Yes | begin_login and complete_login are both operational and security events. |
| Platform token validation | Yes: gestaltd.auth.* with action=validate_token | Partially | Successful per-request validation is metrics-only; shared-middleware denials emit auth.authenticate. |
| Pre-invoker authorization denials | No dedicated semantic metric family | Yes | Denied subject access before guarded invocation dispatch is audited with the attempted operation or operations.list. |
| Connection auth start and completion | Yes: gestaltd.connection.auth.* | Yes | Covers OAuth start and completion plus manual connect completion. |
| Connection credential refresh | Yes: gestaltd.connection.auth.* with action=refresh | No | Refresh is system maintenance behavior, not a user-facing audit event. |
| Credentialed HTTP catalog discovery | Yes: gestaltd.discovery.* with action=list_operations | No | Operation listing stays metrics-only even when it resolves session catalogs. |
| Agent lifecycle and provider calls | Yes: gestaltd.agent.* | No | Audit the user-facing invocation or workflow that created the agent work, not every provider poll or state read. |
| Authorization provider calls | Yes: gestaltd.authorization.* | No | Provider interface calls are platform internals; explicit authorization-denied decisions are audited at the guarded action boundary. |
| Credential and catalog resolution | Yes: gestaltd.credential.* and gestaltd.catalog.* | No | These are pre-invocation plumbing paths; audit the higher-level action instead. |
| IndexedDB operations | Yes: db.client.operation.duration | No | Audit the higher-level user action instead of low-level storage calls. |
| API token inventory read | No dedicated semantic metric family | Yes | api_token.list is audited because it exposes stored API-token inventory. |
| API token lifecycle | No dedicated semantic metric family | Yes | api_token.create, api_token.revoke, and api_token.revoke_all are audited. |
| Logout, pending selection, disconnect | No dedicated semantic metric family | Yes | These are workflow and security events rather than aggregate telemetry series. |
Current Non-Goals
The built-in gestaltd metric surface emits broker operation result status via
gestalt.result_status and gestalt.result_status_class, but it does not emit
separate per-upstream-request HTTP status_code or status_class metrics for
HTTP catalog transports. It also does not emit deduplicated DAU, WAU, or MAU
analytics for users or plugins. Those need separate semantic decisions or an
analytics pipeline instead of more process-local counters.
HTTP Server Metrics
These come from the OpenTelemetry HTTP middleware wrapped around the server’s
main router. Because the whole router is instrumented once, these metrics cover
API traffic, health and readiness checks, the embedded admin UI, and scrapes of
/metrics itself.
OpenTelemetry
| Metric | Type | Meaning |
|---|---|---|
http.server.request.body.size | Histogram | Inbound request body size |
http.server.response.body.size | Histogram | Response body size |
http.server.request.duration | Histogram | End-to-end request duration |
These metrics use standard OpenTelemetry HTTP server semantic attributes such as
request method, response status, scheme, host, protocol, and route information
when available. Gestalt also attaches stable dimensions for request surfaces it
can resolve, including gestaltd.provider.name,
gestaltd.operation.name, gestaltd.operation.transport,
gestaltd.connection.mode, gestaltd.invocation.surface,
gestaltd.http.binding.name, and gestaltd.ui.name. For hosted HTTP bindings,
use gestaltd.http.binding.name on http.server.request.duration with
gestalt.http_binding on gestaltd.operation.* to correlate the accepted HTTP
delivery with the provider operation it dispatched.
http.route is the framework route template, not the concrete request path. For
example, a request to /api/v1/datadog/query_metrics is grouped under the route
template /api/v1/{integration}/{operation}. Use http.route for route-shape
latency and traffic graphs, and use gestaltd.provider.name plus
gestaltd.operation.name when you need the resolved integration and operation names.
Unknown provider or operation paths keep the templated route label but do not get
resolved provider or operation labels.
For Datadog dashboards, provider-operation status splits can be built from the broker operation metrics:
sum:gestaltd.operation.error_count{service:<service>,gestalt.result_status_class:4xx}
by {gestalt.provider,gestalt.operation,gestalt.result_status}.as_count()sum:gestaltd.operation.error_count{service:<service>,gestalt.result_status_class:5xx}
by {gestalt.provider,gestalt.operation,gestalt.result_status}.as_count()For HTTP-only views, use http.server.request.duration with
http.response.status_code. For cross-surface provider-operation views, prefer
gestaltd.operation.* with gestalt.result_status and
gestalt.result_status_class.
If the process is started with
OTEL_SEMCONV_STABILITY_OPT_IN=http/dup, the underlying HTTP instrumentation
also emits the older legacy HTTP metric families alongside the current ones.
Provider gRPC Metrics
These come from OpenTelemetry gRPC stats handlers used for provider and host service traffic. They only apply to provider-backed components.
OpenTelemetry
| Metric | Type | Meaning |
|---|---|---|
rpc.client.call.duration | Histogram | Outbound provider RPC duration |
rpc.server.call.duration | Histogram | Inbound provider or host-service RPC duration |
These metrics use standard OpenTelemetry gRPC semantic attributes such as
rpc.system.name=grpc, rpc.method, and gRPC status code labels such as
rpc.grpc.status_code. Gestalt also attaches gestaltd.rpc.role,
gestaltd.provider.name, and gestaltd.host_service.name when it can resolve
them.
The built-in gRPC stats handlers emit call duration metrics. They do not emit request body size, response body size, request messages per RPC, or response messages per RPC. Add custom interceptors and metric instruments if those dimensions become operationally important.
If the process is started with OTEL_SEMCONV_STABILITY_OPT_IN=rpc/dup, the
underlying gRPC instrumentation also emits legacy RPC metric families such as
rpc.client.duration alongside the current ones.
Trace Spans
When OTLP tracing is enabled, gestaltd emits trace spans for the main request
and provider execution boundaries.
| Layer | Span name | Key attributes |
|---|---|---|
| HTTP server | gestaltd: {method} {route} | Standard HTTP semantic attributes plus resolved gestaltd.* request-surface attributes when available |
| Broker operation | broker.invoke | gestalt.provider, gestalt.operation, gestalt.subject_id, gestalt.connection_mode |
| gRPC provider client | Per-RPC client spans | Standard RPC semantic attributes plus gestaltd.rpc.role=hosted_plugin_client and gestaltd.provider.name |
| gRPC provider server | Per-RPC server spans | Standard RPC semantic attributes plus gestaltd.rpc.role=provider_server and gestaltd.provider.name |
| gRPC host service server | Per-RPC server spans | Standard RPC semantic attributes plus gestaltd.rpc.role=host_service_server, gestaltd.provider.name, and gestaltd.host_service.name |
| Agent and catalog internals | agent.operation, agent.tool.resolve, catalog.operation.resolve, agent.run_metadata.write, agent.provider.operation | Low-cardinality gestalt.agent.*, gestalt.catalog.*, gestalt.provider, and gestalt.operation attributes depending on the span |
HTTP and gRPC spans use the same gestaltd.* dimensions as their corresponding
standard metrics. Broker, agent, credential, catalog, auth, and discovery spans
remain Gestalt custom spans and keep their established gestalt.* attributes.
Prometheus Scrape Endpoint
When telemetry metrics are enabled, Gestalt exposes a Prometheus scrape endpoint
at /metrics. With providers.telemetry.default.source: stdout, metrics are
kept local and served directly. With providers.telemetry.default.source: otlp, metrics are exported over
OTLP and also served at /metrics locally unless you disable the Prometheus
bridge. Setting providers.telemetry.default.source: noop disables Prometheus scraping entirely; /metrics
returns a clear unavailable response so the admin UI can surface that state.
When Gestalt serves /metrics on the public listener, it is authenticated with
the same session or bearer-token middleware as the rest of the HTTP API. When
server.management is configured, /metrics moves to the management listener
and is expected to be protected by network policy or an internal-only reverse
proxy instead. That split-listener shape is the recommended production
deployment. Keeping /metrics on the public listener is mainly for local
development and other trusted-network environments. The embedded admin UI
visualizes that same scrape surface instead of using a separate metrics API.
The Prometheus endpoint and OTLP export hang off the same meter provider and include the HTTP middleware metrics, broker metrics, plugin gRPC client metrics, and exporter metadata described above.
The admin UI intentionally stays basic: it renders summary cards and lightweight
charts derived from /metrics, with in-browser time-window and refresh controls
scoped to the current page session. There is no persisted history across page
reloads and no cross-replica aggregation. It is designed for built-in operator
visibility, not as a replacement for a full observability backend.
What Gets Logged
Gestalt emits structured log entries for config loading and startup, provider
readiness, auth and connection flow failures, invocation failures, datastore
warnings, and audit records tagged with log.type=audit. When
OTLP is enabled, the server routes these logs through the OpenTelemetry log
bridge, and you can split audit records into a dedicated sink by filtering on
log.type=audit.