Observability

Gestalt logs to stdout by default. It exposes /health and /ready for orchestrators, a Prometheus-compatible /metrics endpoint, and optional OTLP export for traces and metrics. The audit logging page covers the audit-specific log schema and event coverage.

Health Endpoints

The /health endpoint returns 200 whenever the process is alive. The /ready endpoint returns 503 until all configured providers and the datastore have finished initialization, then switches to 200.

Default Telemetry

If you omit the providers.telemetry block, Gestalt synthesizes providers.telemetry.default with built-in stdout.


providers:
  telemetry:
    default:
      source: stdout

That gives you structured logs without external collectors. The stdout source also exposes a Prometheus-compatible /metrics endpoint. Traces remain disabled unless you switch to providers.telemetry.<name>.source: otlp.

OTLP Export

Set providers.telemetry.<name>.source: otlp to export traces, metrics, and logs over OpenTelemetry.


providers:
  telemetry:
    default:
      source: otlp
      config:
        endpoint: otel-collector:4317
        protocol: grpc
        serviceName: gestaltd
        traces:
          samplingRatio: 1.0
        metrics:
          interval: 60s
        logs:
          level: info

If you want traces and metrics in OTLP but need application logs to remain on stdout for the host platform to collect, override the log exporter.


providers:
  telemetry:
    default:
      source: otlp
      config:
        endpoint: otel-collector:4317
        protocol: grpc
        logs:
          exporter: stdout
          format: json
          level: info

Audit Routing

Audit logs inherit the main telemetry logger by default. If you need a different route for compliance, analytics, or warehousing, configure providers.audit separately.


providers:
  telemetry:
    default:
      source: stdout
      config:
        format: json
 
  audit:
    default:
      source: otlp
      config:
        endpoint: audit-collector:4317
        protocol: grpc
        headers:
          Authorization: Bearer ${AUDIT_OTLP_TOKEN}

That keeps application logs on stdout while exporting only audit records over OTLP. You can also set providers.audit.<name>.source: stdout to keep audit logs on stdout when telemetry is noop, or providers.audit.<name>.source: noop to disable audit output explicitly. If you leave providers.audit.default.source: inherit, collectors can still split the stream by filtering on log.type=audit.

Emitted Metrics

The built-in metric surface covers Prometheus exporter metadata, inbound HTTP requests handled by gestaltd, broker operation execution, connection authentication flows, platform authentication actions, agent lifecycle operations, authorization provider calls, credential resolution, catalog resolution, database client operations, and gRPC calls between gestaltd, provider processes, and host services.

Gestalt documents metric names using their OpenTelemetry instrument names. The Prometheus scrape endpoint translates those names to Prometheus-style metric families: dots become underscores, counters gain a _total suffix, units become suffixes like _seconds, _milliseconds, or _bytes, and histograms appear as _bucket, _sum, and _count series under the same family.

For example, gestaltd.operation.count is exposed as gestaltd_operation_count_total, and gestaltd.operation.duration is exposed as the gestaltd_operation_duration_seconds histogram family.

Gestalt follows OpenTelemetry semantic conventions for standard transport and database metrics. HTTP server metrics follow the HTTP metric semantic conventions , gRPC metrics follow the RPC metric semantic conventions , and datastore metrics follow the database client metric semantic conventions . Gestalt-specific dimensions on those standard metrics use the gestaltd.* namespace so they do not collide with OpenTelemetry registry attributes. Custom Gestalt metric families such as gestaltd.operation.*, gestaltd.auth.*, and gestaltd.discovery.* keep their established gestalt.* attributes.

Prometheus Exporter Metadata

Prometheus family	Type	Meaning
`target_info`	Gauge	Resource attributes including `service.name`

Broker Operation Metrics

These metrics are recorded around broker.invoke.

OpenTelemetry

Metric	Type	Meaning
`gestaltd.operation.count`	Counter	Operation invocations
`gestaltd.operation.error_count`	Counter	Failed operation invocations
`gestaltd.operation.duration`	Histogram	Operation invocation duration

Metric	Type	Meaning
`gestaltd_operation_count_total`	Counter	Operation invocations
`gestaltd_operation_error_count_total`	Counter	Failed operation invocations
`gestaltd_operation_duration_seconds`	Histogram	Operation invocation duration

These metrics carry low-cardinality attributes: gestalt.provider (the plugin key), gestalt.operation (the catalog operation id), gestalt.transport (how the operation is implemented: plugin, rest, or mcp-passthrough), and gestalt.connection_mode (the provider auth mode: none or user). When the invocation came from a known surface, they also carry gestalt.invocation_surface. Hosted HTTP binding invocations also carry gestalt.http_binding, which is the configured binding name.

Operation metrics also carry gestalt.result_status and gestalt.result_status_class. gestalt.result_status is the normalized HTTP-like operation result status (200, 400, 502, and so on), and gestalt.result_status_class is the matching class (1xx, 2xx, 3xx, 4xx, 5xx, or unknown). Provider-returned statuses use the provider OperationResult status. Platform failures before provider execution use the same status mapping as the HTTP invocation handlers. gestaltd.operation.error_count records invocations with an invocation error or an operation result status of 400 or higher, so provider-returned 4xx and 5xx results are counted as operation failures.

When a request references an unknown provider or operation, gestaltd records unknown for the missing metric attributes instead of using raw user input.

Connection Authentication Metrics

These metrics are recorded around plugin connection authentication flows including OAuth start and completion, manual connect, and token refresh.

OpenTelemetry

Metric	Type	Meaning
`gestaltd.connection.auth.count`	Counter	Connection authentication attempts
`gestaltd.connection.auth.error_count`	Counter	Failed connection authentication attempts
`gestaltd.connection.auth.duration`	Histogram	Connection authentication duration

Metric	Type	Meaning
`gestaltd_connection_auth_count_total`	Counter	Connection authentication attempts
`gestaltd_connection_auth_error_count_total`	Counter	Failed connection authentication attempts
`gestaltd_connection_auth_duration_seconds`	Histogram	Connection authentication duration

These metrics carry low-cardinality attributes: gestalt.provider (the plugin key), gestalt.type (oauth or manual), gestalt.action (start, complete, or refresh), and gestalt.connection_mode (the provider authentication mode: none or user).

When a request references an unknown plugin, gestaltd records unknown for the provider and connection mode metric attributes instead of using raw request input.

Platform Authentication Metrics

These metrics are recorded around the configured top-level authentication provider used to log into Gestalt and validate session tokens.

OpenTelemetry

Metric	Type	Meaning
`gestaltd.auth.count`	Counter	Platform authentication actions
`gestaltd.auth.error_count`	Counter	Failed platform authentication actions
`gestaltd.auth.duration`	Histogram	Platform authentication action duration

Metric	Type	Meaning
`gestaltd_auth_count_total`	Counter	Platform authentication actions
`gestaltd_auth_error_count_total`	Counter	Failed platform authentication actions
`gestaltd_auth_duration_seconds`	Histogram	Platform authentication action duration

These metrics carry low-cardinality attributes: gestalt.provider (the auth provider name) and gestalt.action (begin_login, complete_login, or validate_token).

Discovery Metrics

These metrics are recorded around credentialed HTTP catalog discovery, currently GET /api/v1/integrations/{name}/operations when gestaltd resolves a session catalog with stored or identity credentials.

OpenTelemetry

Metric	Type	Meaning
`gestaltd.discovery.count`	Counter	Credentialed discovery attempts
`gestaltd.discovery.error_count`	Counter	Failed credentialed discovery attempts
`gestaltd.discovery.duration`	Histogram	Credentialed discovery duration

Metric	Type	Meaning
`gestaltd_discovery_count_total`	Counter	Credentialed discovery attempts
`gestaltd_discovery_error_count_total`	Counter	Failed credentialed discovery attempts
`gestaltd_discovery_duration_seconds`	Histogram	Credentialed discovery duration

These metrics carry low-cardinality attributes: gestalt.provider (the plugin key), gestalt.action (currently list_operations), and gestalt.connection_mode (the provider auth mode: none or user). gestaltd.discovery.error_count increments when the credentialed session-catalog path fails, even if gestaltd falls back to a static catalog and still serves a successful HTTP response.

Agent Metrics

Agent metrics are recorded at the gestaltd agent facade and at the configured agent provider boundary. They use the same triplet shape as other custom Gestalt metrics: <family>.count, <family>.error_count, and <family>.duration.

OpenTelemetry family	Prometheus families	Meaning
`gestaltd.agent.operation.*`	`gestaltd_agent_operation_count_total`, `gestaltd_agent_operation_error_count_total`, `gestaltd_agent_operation_duration_seconds`	Public agent manager operations such as `create_session`, `create_turn`, `list_turn_events`, and `resolve_interaction`
`gestaltd.agent.provider.operation.*`	`gestaltd_agent_provider_operation_count_total`, `gestaltd_agent_provider_operation_error_count_total`, `gestaltd_agent_provider_operation_duration_seconds`	Calls into the selected provider-owned agent system of record
`gestaltd.agent.tool.resolve.*`	`gestaltd_agent_tool_resolve_count_total`, `gestaltd_agent_tool_resolve_error_count_total`, `gestaltd_agent_tool_resolve_duration_seconds`	Tool reference resolution before a turn is sent to a provider

These metric families use low-cardinality attributes:

gestalt.agent.operation identifies the agent operation or provider method.
gestalt.agent.provider identifies the configured agent provider on provider-bound metric families.
gestalt.agent.tool.source identifies tool binding mode for tool resolution: native_search.

Agent HTTP requests also form trace trees when OTLP tracing is enabled. A typical POST /api/v1/agent/sessions/{id}/turns trace contains agent.operation, agent.tool.resolve, catalog.operation.resolve, and agent.provider.operation spans under the inbound HTTP server span.

Authorization Provider Metrics

Authorization provider metrics cover both direct provider interface calls and provider-backed subject-access evaluation performed by gestaltd.

OpenTelemetry family	Prometheus families	Meaning
`gestaltd.authorization.provider.operation.*`	`gestaltd_authorization_provider_operation_count_total`, `gestaltd_authorization_provider_operation_error_count_total`, `gestaltd_authorization_provider_operation_duration_seconds`	Calls to the configured authorization provider interface
`gestaltd.authorization.provider.evaluate.*`	`gestaltd_authorization_provider_evaluate_count_total`, `gestaltd_authorization_provider_evaluate_error_count_total`, `gestaltd_authorization_provider_evaluate_duration_seconds`	Batched authorization evaluations used by provider-backed subject-access checks

These metrics carry low-cardinality attributes:

gestalt.authorization.provider identifies the configured authorization provider.
gestalt.authorization.operation identifies the provider method for gestaltd.authorization.provider.operation.*.
gestalt.authorization.scope identifies the resource type for provider-backed evaluation, such as integration or operation.

Credential And Catalog Resolution Metrics

Credential and catalog resolution metrics make the pre-invocation and agent tool paths visible before work reaches a plugin provider.

OpenTelemetry family	Prometheus families	Meaning
`gestaltd.credential.provider.operation.*`	`gestaltd_credential_provider_operation_count_total`, `gestaltd_credential_provider_operation_error_count_total`, `gestaltd_credential_provider_operation_duration_seconds`	Calls to the configured external credential provider
`gestaltd.catalog.operation.resolve.*`	`gestaltd_catalog_operation_resolve_count_total`, `gestaltd_catalog_operation_resolve_error_count_total`, `gestaltd_catalog_operation_resolve_duration_seconds`	Static or session-catalog operation resolution

These metrics carry low-cardinality attributes:

gestalt.credential.provider identifies the configured credential provider.
gestalt.credential.operation identifies the credential provider method.
gestalt.provider identifies the plugin provider when the path has resolved one.
gestalt.operation identifies the plugin operation for catalog operation resolution.
gestalt.catalog.source is attached to catalog resolution spans and records whether the resolved operation came from static, session, or a pre-resolved context operation.

Database Client Metrics

These metrics are recorded around every ObjectStore and Index operation on the system IndexedDB instance. They use the OpenTelemetry db.client.operation.duration histogram instead of a Gestalt-specific metric family. The instrumentation is applied once at the interface level in bootstrap, so both core services (users and API tokens) and provider-hosted traffic are covered.

OpenTelemetry

Metric	Type	Meaning
`db.client.operation.duration`	Histogram	IndexedDB operation duration

Metric	Type	Meaning
`db_client_operation_duration_seconds`	Histogram	IndexedDB operation duration

These metrics carry low-cardinality attributes:

db.system.name is gestaltd.indexeddb for the built-in IndexedDB implementation.
db.namespace identifies the logical IndexedDB resource configured in gestaltd, such as the server’s primary IndexedDB binding.
db.collection.name identifies the IndexedDB object store being accessed, such as users, external_credentials, or api_tokens.
db.operation.name identifies the normalized ObjectStore or Index operation: get, get_key, put, add, delete, clear, get_all, get_all_keys, count, delete_range, open_cursor, open_key_cursor, index_get, index_get_key, index_get_all, index_get_all_keys, index_count, index_delete, index_open_cursor, or index_open_key_cursor.
gestaltd.provider.name identifies the provider namespace for provider-hosted IndexedDB traffic. Core services omit this attribute.
gestaltd.indexeddb.index.name identifies the index name for Index operations.
error.type is attached only when an operation fails. Bounded values include canceled, deadline_exceeded, not_found, already_exists, keys_only, and internal.

Expected misses that return ErrNotFound are recorded as failed database operations with error.type=not_found. Use count:db.client.operation.duration{error.type:*}.as_count() or the matching Prometheus histogram count series to build error counts.

Metrics And Audit Boundaries

Use metrics for low-cardinality aggregates such as volume, latency, and error rate. Use audit logs for user-facing or security-sensitive actions where you need actor, target, and outcome details.

Surface	Metrics	Audit Logs	Notes
Provider operation invocation	Yes: `gestaltd.operation.*`	Yes	Every guarded HTTP and MCP invocation is audited.
Platform login start and completion	Yes: `gestaltd.auth.*`	Yes	`begin_login` and `complete_login` are both operational and security events.
Platform token validation	Yes: `gestaltd.auth.*` with `action=validate_token`	Partially	Successful per-request validation is metrics-only; shared-middleware denials emit `auth.authenticate`.
Pre-invoker authorization denials	No dedicated semantic metric family	Yes	Denied subject access before guarded invocation dispatch is audited with the attempted operation or `operations.list`.
Connection auth start and completion	Yes: `gestaltd.connection.auth.*`	Yes	Covers OAuth start and completion plus manual connect completion.
Connection credential refresh	Yes: `gestaltd.connection.auth.*` with `action=refresh`	No	Refresh is system maintenance behavior, not a user-facing audit event.
Credentialed HTTP catalog discovery	Yes: `gestaltd.discovery.*` with `action=list_operations`	No	Operation listing stays metrics-only even when it resolves session catalogs.
Agent lifecycle and provider calls	Yes: `gestaltd.agent.*`	No	Audit the user-facing invocation or workflow that created the agent work, not every provider poll or state read.
Authorization provider calls	Yes: `gestaltd.authorization.*`	No	Provider interface calls are platform internals; explicit authorization-denied decisions are audited at the guarded action boundary.
Credential and catalog resolution	Yes: `gestaltd.credential.` and `gestaltd.catalog.`	No	These are pre-invocation plumbing paths; audit the higher-level action instead.
IndexedDB operations	Yes: `db.client.operation.duration`	No	Audit the higher-level user action instead of low-level storage calls.
API token inventory read	No dedicated semantic metric family	Yes	`api_token.list` is audited because it exposes stored API-token inventory.
API token lifecycle	No dedicated semantic metric family	Yes	`api_token.create`, `api_token.revoke`, and `api_token.revoke_all` are audited.
Logout, pending selection, disconnect	No dedicated semantic metric family	Yes	These are workflow and security events rather than aggregate telemetry series.

Current Non-Goals

The built-in gestaltd metric surface emits broker operation result status via gestalt.result_status and gestalt.result_status_class, but it does not emit separate per-upstream-request HTTP status_code or status_class metrics for HTTP catalog transports. It also does not emit deduplicated DAU, WAU, or MAU analytics for users or plugins. Those need separate semantic decisions or an analytics pipeline instead of more process-local counters.

HTTP Server Metrics

These come from the OpenTelemetry HTTP middleware wrapped around the server’s main router. Because the whole router is instrumented once, these metrics cover API traffic, health and readiness checks, the embedded admin UI, and scrapes of /metrics itself.

OpenTelemetry

Metric	Type	Meaning
`http.server.request.body.size`	Histogram	Inbound request body size
`http.server.response.body.size`	Histogram	Response body size
`http.server.request.duration`	Histogram	End-to-end request duration

Metric	Type	Meaning
`http_server_request_body_size_bytes`	Histogram	Inbound request body size
`http_server_response_body_size_bytes`	Histogram	Response body size
`http_server_request_duration_seconds`	Histogram	End-to-end request duration

These metrics use standard OpenTelemetry HTTP server semantic attributes such as request method, response status, scheme, host, protocol, and route information when available. Gestalt also attaches stable dimensions for request surfaces it can resolve, including gestaltd.provider.name, gestaltd.operation.name, gestaltd.operation.transport, gestaltd.connection.mode, gestaltd.invocation.surface, gestaltd.http.binding.name, and gestaltd.ui.name. For hosted HTTP bindings, use gestaltd.http.binding.name on http.server.request.duration with gestalt.http_binding on gestaltd.operation.* to correlate the accepted HTTP delivery with the provider operation it dispatched.

http.route is the framework route template, not the concrete request path. For example, a request to /api/v1/datadog/query_metrics is grouped under the route template /api/v1/{integration}/{operation}. Use http.route for route-shape latency and traffic graphs, and use gestaltd.provider.name plus gestaltd.operation.name when you need the resolved integration and operation names. Unknown provider or operation paths keep the templated route label but do not get resolved provider or operation labels.

For Datadog dashboards, provider-operation status splits can be built from the broker operation metrics:


sum:gestaltd.operation.error_count{service:<service>,gestalt.result_status_class:4xx}
  by {gestalt.provider,gestalt.operation,gestalt.result_status}.as_count()


sum:gestaltd.operation.error_count{service:<service>,gestalt.result_status_class:5xx}
  by {gestalt.provider,gestalt.operation,gestalt.result_status}.as_count()

For HTTP-only views, use http.server.request.duration with http.response.status_code. For cross-surface provider-operation views, prefer gestaltd.operation.* with gestalt.result_status and gestalt.result_status_class.

If the process is started with OTEL_SEMCONV_STABILITY_OPT_IN=http/dup, the underlying HTTP instrumentation also emits the older legacy HTTP metric families alongside the current ones.

Provider gRPC Metrics

These come from OpenTelemetry gRPC stats handlers used for provider and host service traffic. They only apply to provider-backed components.

OpenTelemetry

Metric	Type	Meaning
`rpc.client.call.duration`	Histogram	Outbound provider RPC duration
`rpc.server.call.duration`	Histogram	Inbound provider or host-service RPC duration

Metric	Type	Meaning
`rpc_client_call_duration_seconds`	Histogram	Outbound plugin RPC duration
`rpc_server_call_duration_seconds`	Histogram	Inbound provider or host-service RPC duration

These metrics use standard OpenTelemetry gRPC semantic attributes such as rpc.system.name=grpc, rpc.method, and gRPC status code labels such as rpc.grpc.status_code. Gestalt also attaches gestaltd.rpc.role, gestaltd.provider.name, and gestaltd.host_service.name when it can resolve them.

The built-in gRPC stats handlers emit call duration metrics. They do not emit request body size, response body size, request messages per RPC, or response messages per RPC. Add custom interceptors and metric instruments if those dimensions become operationally important.

If the process is started with OTEL_SEMCONV_STABILITY_OPT_IN=rpc/dup, the underlying gRPC instrumentation also emits legacy RPC metric families such as rpc.client.duration alongside the current ones.

Trace Spans

When OTLP tracing is enabled, gestaltd emits trace spans for the main request and provider execution boundaries.

Layer	Span name	Key attributes
HTTP server	`gestaltd: {method} {route}`	Standard HTTP semantic attributes plus resolved `gestaltd.*` request-surface attributes when available
Broker operation	`broker.invoke`	`gestalt.provider`, `gestalt.operation`, `gestalt.subject_id`, `gestalt.connection_mode`
gRPC provider client	Per-RPC client spans	Standard RPC semantic attributes plus `gestaltd.rpc.role=hosted_plugin_client` and `gestaltd.provider.name`
gRPC provider server	Per-RPC server spans	Standard RPC semantic attributes plus `gestaltd.rpc.role=provider_server` and `gestaltd.provider.name`
gRPC host service server	Per-RPC server spans	Standard RPC semantic attributes plus `gestaltd.rpc.role=host_service_server`, `gestaltd.provider.name`, and `gestaltd.host_service.name`
Agent and catalog internals	`agent.operation`, `agent.tool.resolve`, `catalog.operation.resolve`, `agent.run_metadata.write`, `agent.provider.operation`	Low-cardinality `gestalt.agent.`, `gestalt.catalog.`, `gestalt.provider`, and `gestalt.operation` attributes depending on the span

HTTP and gRPC spans use the same gestaltd.* dimensions as their corresponding standard metrics. Broker, agent, credential, catalog, auth, and discovery spans remain Gestalt custom spans and keep their established gestalt.* attributes.

Prometheus Scrape Endpoint

When telemetry metrics are enabled, Gestalt exposes a Prometheus scrape endpoint at /metrics. With providers.telemetry.default.source: stdout, metrics are kept local and served directly. With providers.telemetry.default.source: otlp, metrics are exported over OTLP and also served at /metrics locally unless you disable the Prometheus bridge. Setting providers.telemetry.default.source: noop disables Prometheus scraping entirely; /metrics returns a clear unavailable response so the admin UI can surface that state.

When Gestalt serves /metrics on the public listener, it is authenticated with the same session or bearer-token middleware as the rest of the HTTP API. When server.management is configured, /metrics moves to the management listener and is expected to be protected by network policy or an internal-only reverse proxy instead. That split-listener shape is the recommended production deployment. Keeping /metrics on the public listener is mainly for local development and other trusted-network environments. The embedded admin UI visualizes that same scrape surface instead of using a separate metrics API.

The Prometheus endpoint and OTLP export hang off the same meter provider and include the HTTP middleware metrics, broker metrics, plugin gRPC client metrics, and exporter metadata described above.

The admin UI intentionally stays basic: it renders summary cards and lightweight charts derived from /metrics, with in-browser time-window and refresh controls scoped to the current page session. There is no persisted history across page reloads and no cross-replica aggregation. It is designed for built-in operator visibility, not as a replacement for a full observability backend.

What Gets Logged

Gestalt emits structured log entries for config loading and startup, provider readiness, auth and connection flow failures, invocation failures, datastore warnings, and audit records tagged with log.type=audit. When OTLP is enabled, the server routes these logs through the OpenTelemetry log bridge, and you can split audit records into a dedicated sink by filtering on log.type=audit.