Skip to Content
Observability

Observability

Gestalt logs to stdout by default. It exposes /health and /ready for orchestrators, a Prometheus-compatible /metrics endpoint, and optional OTLP export for traces and metrics. The audit logging page covers the audit-specific log schema and event coverage.

Health Endpoints

The /health endpoint returns 200 whenever the process is alive. The /ready endpoint returns 503 until all configured providers and the datastore have finished initialization, then switches to 200.

Default Telemetry

If you omit the providers.telemetry block, Gestalt synthesizes providers.telemetry.default with built-in stdout.

providers: telemetry: default: source: stdout

That gives you structured logs without external collectors. The stdout source also exposes a Prometheus-compatible /metrics endpoint. Traces remain disabled unless you switch to providers.telemetry.<name>.source: otlp.

OTLP Export

Set providers.telemetry.<name>.source: otlp to export traces, metrics, and logs over OpenTelemetry.

providers: telemetry: default: source: otlp config: endpoint: otel-collector:4317 protocol: grpc serviceName: gestaltd traces: samplingRatio: 1.0 metrics: interval: 60s logs: level: info

If you want traces and metrics in OTLP but need application logs to remain on stdout for the host platform to collect, override the log exporter.

providers: telemetry: default: source: otlp config: endpoint: otel-collector:4317 protocol: grpc logs: exporter: stdout format: json level: info

Audit Routing

Audit logs inherit the main telemetry logger by default. If you need a different route for compliance, analytics, or warehousing, configure providers.audit separately.

providers: telemetry: default: source: stdout config: format: json audit: default: source: otlp config: endpoint: audit-collector:4317 protocol: grpc headers: Authorization: Bearer ${AUDIT_OTLP_TOKEN}

That keeps application logs on stdout while exporting only audit records over OTLP. You can also set providers.audit.<name>.source: stdout to keep audit logs on stdout when telemetry is noop, or providers.audit.<name>.source: noop to disable audit output explicitly. If you leave providers.audit.default.source: inherit, collectors can still split the stream by filtering on log.type=audit.

Emitted Metrics

The built-in metric surface covers Prometheus exporter metadata, inbound HTTP requests handled by gestaltd, broker operation execution, connection authentication flows, platform authentication actions, agent lifecycle operations, authorization provider calls, credential resolution, catalog resolution, database client operations, and gRPC calls between gestaltd, provider processes, and host services.

Gestalt documents metric names using their OpenTelemetry instrument names. The Prometheus scrape endpoint translates those names to Prometheus-style metric families: dots become underscores, counters gain a _total suffix, units become suffixes like _seconds, _milliseconds, or _bytes, and histograms appear as _bucket, _sum, and _count series under the same family.

For example, gestaltd.operation.count is exposed as gestaltd_operation_count_total, and gestaltd.operation.duration is exposed as the gestaltd_operation_duration_seconds histogram family.

Gestalt follows OpenTelemetry semantic conventions for standard transport and database metrics. HTTP server metrics follow the HTTP metric semantic conventions , gRPC metrics follow the RPC metric semantic conventions , and datastore metrics follow the database client metric semantic conventions . Gestalt-specific dimensions on those standard metrics use the gestaltd.* namespace so they do not collide with OpenTelemetry registry attributes. Custom Gestalt metric families such as gestaltd.operation.*, gestaltd.auth.*, and gestaltd.discovery.* keep their established gestalt.* attributes.

Prometheus Exporter Metadata

Prometheus familyTypeMeaning
target_infoGaugeResource attributes including service.name

Broker Operation Metrics

These metrics are recorded around broker.invoke.

MetricTypeMeaning
gestaltd.operation.countCounterOperation invocations
gestaltd.operation.error_countCounterFailed operation invocations
gestaltd.operation.durationHistogramOperation invocation duration

These metrics carry low-cardinality attributes: gestalt.provider (the plugin key), gestalt.operation (the catalog operation id), gestalt.transport (how the operation is implemented: plugin, rest, or mcp-passthrough), and gestalt.connection_mode (the provider auth mode: none or user). When the invocation came from a known surface, they also carry gestalt.invocation_surface. Hosted HTTP binding invocations also carry gestalt.http_binding, which is the configured binding name.

Operation metrics also carry gestalt.result_status and gestalt.result_status_class. gestalt.result_status is the normalized HTTP-like operation result status (200, 400, 502, and so on), and gestalt.result_status_class is the matching class (1xx, 2xx, 3xx, 4xx, 5xx, or unknown). Provider-returned statuses use the provider OperationResult status. Platform failures before provider execution use the same status mapping as the HTTP invocation handlers. gestaltd.operation.error_count records invocations with an invocation error or an operation result status of 400 or higher, so provider-returned 4xx and 5xx results are counted as operation failures.

When a request references an unknown provider or operation, gestaltd records unknown for the missing metric attributes instead of using raw user input.

Connection Authentication Metrics

These metrics are recorded around plugin connection authentication flows including OAuth start and completion, manual connect, and token refresh.

MetricTypeMeaning
gestaltd.connection.auth.countCounterConnection authentication attempts
gestaltd.connection.auth.error_countCounterFailed connection authentication attempts
gestaltd.connection.auth.durationHistogramConnection authentication duration

These metrics carry low-cardinality attributes: gestalt.provider (the plugin key), gestalt.type (oauth or manual), gestalt.action (start, complete, or refresh), and gestalt.connection_mode (the provider authentication mode: none or user).

When a request references an unknown plugin, gestaltd records unknown for the provider and connection mode metric attributes instead of using raw request input.

Platform Authentication Metrics

These metrics are recorded around the configured top-level authentication provider used to log into Gestalt and validate session tokens.

MetricTypeMeaning
gestaltd.auth.countCounterPlatform authentication actions
gestaltd.auth.error_countCounterFailed platform authentication actions
gestaltd.auth.durationHistogramPlatform authentication action duration

These metrics carry low-cardinality attributes: gestalt.provider (the auth provider name) and gestalt.action (begin_login, complete_login, or validate_token).

Discovery Metrics

These metrics are recorded around credentialed HTTP catalog discovery, currently GET /api/v1/integrations/{name}/operations when gestaltd resolves a session catalog with stored or identity credentials.

MetricTypeMeaning
gestaltd.discovery.countCounterCredentialed discovery attempts
gestaltd.discovery.error_countCounterFailed credentialed discovery attempts
gestaltd.discovery.durationHistogramCredentialed discovery duration

These metrics carry low-cardinality attributes: gestalt.provider (the plugin key), gestalt.action (currently list_operations), and gestalt.connection_mode (the provider auth mode: none or user). gestaltd.discovery.error_count increments when the credentialed session-catalog path fails, even if gestaltd falls back to a static catalog and still serves a successful HTTP response.

Agent Metrics

Agent metrics are recorded at the gestaltd agent facade and at the configured agent provider boundary. They use the same triplet shape as other custom Gestalt metrics: <family>.count, <family>.error_count, and <family>.duration.

OpenTelemetry familyPrometheus familiesMeaning
gestaltd.agent.operation.*gestaltd_agent_operation_count_total, gestaltd_agent_operation_error_count_total, gestaltd_agent_operation_duration_secondsPublic agent manager operations such as create_session, create_turn, list_turn_events, and resolve_interaction
gestaltd.agent.provider.operation.*gestaltd_agent_provider_operation_count_total, gestaltd_agent_provider_operation_error_count_total, gestaltd_agent_provider_operation_duration_secondsCalls into the selected provider-owned agent system of record
gestaltd.agent.tool.resolve.*gestaltd_agent_tool_resolve_count_total, gestaltd_agent_tool_resolve_error_count_total, gestaltd_agent_tool_resolve_duration_secondsTool reference resolution before a turn is sent to a provider

These metric families use low-cardinality attributes:

  • gestalt.agent.operation identifies the agent operation or provider method.
  • gestalt.agent.provider identifies the configured agent provider on provider-bound metric families.
  • gestalt.agent.tool.source identifies tool binding mode for tool resolution: native_search.

Agent HTTP requests also form trace trees when OTLP tracing is enabled. A typical POST /api/v1/agent/sessions/{id}/turns trace contains agent.operation, agent.tool.resolve, catalog.operation.resolve, and agent.provider.operation spans under the inbound HTTP server span.

Authorization Provider Metrics

Authorization provider metrics cover both direct provider interface calls and provider-backed subject-access evaluation performed by gestaltd.

OpenTelemetry familyPrometheus familiesMeaning
gestaltd.authorization.provider.operation.*gestaltd_authorization_provider_operation_count_total, gestaltd_authorization_provider_operation_error_count_total, gestaltd_authorization_provider_operation_duration_secondsCalls to the configured authorization provider interface
gestaltd.authorization.provider.evaluate.*gestaltd_authorization_provider_evaluate_count_total, gestaltd_authorization_provider_evaluate_error_count_total, gestaltd_authorization_provider_evaluate_duration_secondsBatched authorization evaluations used by provider-backed subject-access checks

These metrics carry low-cardinality attributes:

  • gestalt.authorization.provider identifies the configured authorization provider.
  • gestalt.authorization.operation identifies the provider method for gestaltd.authorization.provider.operation.*.
  • gestalt.authorization.scope identifies the resource type for provider-backed evaluation, such as integration or operation.

Credential And Catalog Resolution Metrics

Credential and catalog resolution metrics make the pre-invocation and agent tool paths visible before work reaches a plugin provider.

OpenTelemetry familyPrometheus familiesMeaning
gestaltd.credential.provider.operation.*gestaltd_credential_provider_operation_count_total, gestaltd_credential_provider_operation_error_count_total, gestaltd_credential_provider_operation_duration_secondsCalls to the configured external credential provider
gestaltd.catalog.operation.resolve.*gestaltd_catalog_operation_resolve_count_total, gestaltd_catalog_operation_resolve_error_count_total, gestaltd_catalog_operation_resolve_duration_secondsStatic or session-catalog operation resolution

These metrics carry low-cardinality attributes:

  • gestalt.credential.provider identifies the configured credential provider.
  • gestalt.credential.operation identifies the credential provider method.
  • gestalt.provider identifies the plugin provider when the path has resolved one.
  • gestalt.operation identifies the plugin operation for catalog operation resolution.
  • gestalt.catalog.source is attached to catalog resolution spans and records whether the resolved operation came from static, session, or a pre-resolved context operation.

Database Client Metrics

These metrics are recorded around every ObjectStore and Index operation on the system IndexedDB instance. They use the OpenTelemetry db.client.operation.duration histogram instead of a Gestalt-specific metric family. The instrumentation is applied once at the interface level in bootstrap, so both core services (users and API tokens) and provider-hosted traffic are covered.

MetricTypeMeaning
db.client.operation.durationHistogramIndexedDB operation duration

These metrics carry low-cardinality attributes:

  • db.system.name is gestaltd.indexeddb for the built-in IndexedDB implementation.
  • db.namespace identifies the logical IndexedDB resource configured in gestaltd, such as the server’s primary IndexedDB binding.
  • db.collection.name identifies the IndexedDB object store being accessed, such as users, external_credentials, or api_tokens.
  • db.operation.name identifies the normalized ObjectStore or Index operation: get, get_key, put, add, delete, clear, get_all, get_all_keys, count, delete_range, open_cursor, open_key_cursor, index_get, index_get_key, index_get_all, index_get_all_keys, index_count, index_delete, index_open_cursor, or index_open_key_cursor.
  • gestaltd.provider.name identifies the provider namespace for provider-hosted IndexedDB traffic. Core services omit this attribute.
  • gestaltd.indexeddb.index.name identifies the index name for Index operations.
  • error.type is attached only when an operation fails. Bounded values include canceled, deadline_exceeded, not_found, already_exists, keys_only, and internal.

Expected misses that return ErrNotFound are recorded as failed database operations with error.type=not_found. Use count:db.client.operation.duration{error.type:*}.as_count() or the matching Prometheus histogram count series to build error counts.

Metrics And Audit Boundaries

Use metrics for low-cardinality aggregates such as volume, latency, and error rate. Use audit logs for user-facing or security-sensitive actions where you need actor, target, and outcome details.

SurfaceMetricsAudit LogsNotes
Provider operation invocationYes: gestaltd.operation.*YesEvery guarded HTTP and MCP invocation is audited.
Platform login start and completionYes: gestaltd.auth.*Yesbegin_login and complete_login are both operational and security events.
Platform token validationYes: gestaltd.auth.* with action=validate_tokenPartiallySuccessful per-request validation is metrics-only; shared-middleware denials emit auth.authenticate.
Pre-invoker authorization denialsNo dedicated semantic metric familyYesDenied subject access before guarded invocation dispatch is audited with the attempted operation or operations.list.
Connection auth start and completionYes: gestaltd.connection.auth.*YesCovers OAuth start and completion plus manual connect completion.
Connection credential refreshYes: gestaltd.connection.auth.* with action=refreshNoRefresh is system maintenance behavior, not a user-facing audit event.
Credentialed HTTP catalog discoveryYes: gestaltd.discovery.* with action=list_operationsNoOperation listing stays metrics-only even when it resolves session catalogs.
Agent lifecycle and provider callsYes: gestaltd.agent.*NoAudit the user-facing invocation or workflow that created the agent work, not every provider poll or state read.
Authorization provider callsYes: gestaltd.authorization.*NoProvider interface calls are platform internals; explicit authorization-denied decisions are audited at the guarded action boundary.
Credential and catalog resolutionYes: gestaltd.credential.* and gestaltd.catalog.*NoThese are pre-invocation plumbing paths; audit the higher-level action instead.
IndexedDB operationsYes: db.client.operation.durationNoAudit the higher-level user action instead of low-level storage calls.
API token inventory readNo dedicated semantic metric familyYesapi_token.list is audited because it exposes stored API-token inventory.
API token lifecycleNo dedicated semantic metric familyYesapi_token.create, api_token.revoke, and api_token.revoke_all are audited.
Logout, pending selection, disconnectNo dedicated semantic metric familyYesThese are workflow and security events rather than aggregate telemetry series.

Current Non-Goals

The built-in gestaltd metric surface emits broker operation result status via gestalt.result_status and gestalt.result_status_class, but it does not emit separate per-upstream-request HTTP status_code or status_class metrics for HTTP catalog transports. It also does not emit deduplicated DAU, WAU, or MAU analytics for users or plugins. Those need separate semantic decisions or an analytics pipeline instead of more process-local counters.

HTTP Server Metrics

These come from the OpenTelemetry HTTP middleware wrapped around the server’s main router. Because the whole router is instrumented once, these metrics cover API traffic, health and readiness checks, the embedded admin UI, and scrapes of /metrics itself.

MetricTypeMeaning
http.server.request.body.sizeHistogramInbound request body size
http.server.response.body.sizeHistogramResponse body size
http.server.request.durationHistogramEnd-to-end request duration

These metrics use standard OpenTelemetry HTTP server semantic attributes such as request method, response status, scheme, host, protocol, and route information when available. Gestalt also attaches stable dimensions for request surfaces it can resolve, including gestaltd.provider.name, gestaltd.operation.name, gestaltd.operation.transport, gestaltd.connection.mode, gestaltd.invocation.surface, gestaltd.http.binding.name, and gestaltd.ui.name. For hosted HTTP bindings, use gestaltd.http.binding.name on http.server.request.duration with gestalt.http_binding on gestaltd.operation.* to correlate the accepted HTTP delivery with the provider operation it dispatched.

http.route is the framework route template, not the concrete request path. For example, a request to /api/v1/datadog/query_metrics is grouped under the route template /api/v1/{integration}/{operation}. Use http.route for route-shape latency and traffic graphs, and use gestaltd.provider.name plus gestaltd.operation.name when you need the resolved integration and operation names. Unknown provider or operation paths keep the templated route label but do not get resolved provider or operation labels.

For Datadog dashboards, provider-operation status splits can be built from the broker operation metrics:

sum:gestaltd.operation.error_count{service:<service>,gestalt.result_status_class:4xx} by {gestalt.provider,gestalt.operation,gestalt.result_status}.as_count()
sum:gestaltd.operation.error_count{service:<service>,gestalt.result_status_class:5xx} by {gestalt.provider,gestalt.operation,gestalt.result_status}.as_count()

For HTTP-only views, use http.server.request.duration with http.response.status_code. For cross-surface provider-operation views, prefer gestaltd.operation.* with gestalt.result_status and gestalt.result_status_class.

If the process is started with OTEL_SEMCONV_STABILITY_OPT_IN=http/dup, the underlying HTTP instrumentation also emits the older legacy HTTP metric families alongside the current ones.

Provider gRPC Metrics

These come from OpenTelemetry gRPC stats handlers used for provider and host service traffic. They only apply to provider-backed components.

MetricTypeMeaning
rpc.client.call.durationHistogramOutbound provider RPC duration
rpc.server.call.durationHistogramInbound provider or host-service RPC duration

These metrics use standard OpenTelemetry gRPC semantic attributes such as rpc.system.name=grpc, rpc.method, and gRPC status code labels such as rpc.grpc.status_code. Gestalt also attaches gestaltd.rpc.role, gestaltd.provider.name, and gestaltd.host_service.name when it can resolve them.

The built-in gRPC stats handlers emit call duration metrics. They do not emit request body size, response body size, request messages per RPC, or response messages per RPC. Add custom interceptors and metric instruments if those dimensions become operationally important.

If the process is started with OTEL_SEMCONV_STABILITY_OPT_IN=rpc/dup, the underlying gRPC instrumentation also emits legacy RPC metric families such as rpc.client.duration alongside the current ones.

Trace Spans

When OTLP tracing is enabled, gestaltd emits trace spans for the main request and provider execution boundaries.

LayerSpan nameKey attributes
HTTP servergestaltd: {method} {route}Standard HTTP semantic attributes plus resolved gestaltd.* request-surface attributes when available
Broker operationbroker.invokegestalt.provider, gestalt.operation, gestalt.subject_id, gestalt.connection_mode
gRPC provider clientPer-RPC client spansStandard RPC semantic attributes plus gestaltd.rpc.role=hosted_plugin_client and gestaltd.provider.name
gRPC provider serverPer-RPC server spansStandard RPC semantic attributes plus gestaltd.rpc.role=provider_server and gestaltd.provider.name
gRPC host service serverPer-RPC server spansStandard RPC semantic attributes plus gestaltd.rpc.role=host_service_server, gestaltd.provider.name, and gestaltd.host_service.name
Agent and catalog internalsagent.operation, agent.tool.resolve, catalog.operation.resolve, agent.run_metadata.write, agent.provider.operationLow-cardinality gestalt.agent.*, gestalt.catalog.*, gestalt.provider, and gestalt.operation attributes depending on the span

HTTP and gRPC spans use the same gestaltd.* dimensions as their corresponding standard metrics. Broker, agent, credential, catalog, auth, and discovery spans remain Gestalt custom spans and keep their established gestalt.* attributes.

Prometheus Scrape Endpoint

When telemetry metrics are enabled, Gestalt exposes a Prometheus scrape endpoint at /metrics. With providers.telemetry.default.source: stdout, metrics are kept local and served directly. With providers.telemetry.default.source: otlp, metrics are exported over OTLP and also served at /metrics locally unless you disable the Prometheus bridge. Setting providers.telemetry.default.source: noop disables Prometheus scraping entirely; /metrics returns a clear unavailable response so the admin UI can surface that state.

When Gestalt serves /metrics on the public listener, it is authenticated with the same session or bearer-token middleware as the rest of the HTTP API. When server.management is configured, /metrics moves to the management listener and is expected to be protected by network policy or an internal-only reverse proxy instead. That split-listener shape is the recommended production deployment. Keeping /metrics on the public listener is mainly for local development and other trusted-network environments. The embedded admin UI visualizes that same scrape surface instead of using a separate metrics API.

The Prometheus endpoint and OTLP export hang off the same meter provider and include the HTTP middleware metrics, broker metrics, plugin gRPC client metrics, and exporter metadata described above.

The admin UI intentionally stays basic: it renders summary cards and lightweight charts derived from /metrics, with in-browser time-window and refresh controls scoped to the current page session. There is no persisted history across page reloads and no cross-replica aggregation. It is designed for built-in operator visibility, not as a replacement for a full observability backend.

What Gets Logged

Gestalt emits structured log entries for config loading and startup, provider readiness, auth and connection flow failures, invocation failures, datastore warnings, and audit records tagged with log.type=audit. When OTLP is enabled, the server routes these logs through the OpenTelemetry log bridge, and you can split audit records into a dedicated sink by filtering on log.type=audit.