add OpenTelemetry metrics instrumentation#3110
add OpenTelemetry metrics instrumentation#3110PavelPashov wants to merge 50 commits intoredis:masterfrom
Conversation
a11117e to
5fb1791
Compare
1be4565 to
03db1a2
Compare
There was a problem hiding this comment.
❌ The following Jit checks failed to run:
- license-compliance-checker
- secret-detection-trufflehog
- software-component-analysis-js
- static-code-analysis-semgrep-pro
#jit_bypass_commit in this PR to bypass, Jit Admin privileges required.
More info in the Jit platform.
1106f8a to
242e98d
Compare
|
This is an exciting feature, I hope it makes it to main! |
❌ Security scan failedSecurity scan failed: Branch feat/add-opentelemetry-metrics does not exist in the remote repository 💡 Need to bypass this check? Comment |
364ff86 to
30b0e8c
Compare
❌ Security scan failedSecurity scan failed: Branch feat/add-opentelemetry-metrics does not exist in the remote repository 💡 Need to bypass this check? Comment |
1 similar comment
❌ Security scan failedSecurity scan failed: Branch feat/add-opentelemetry-metrics does not exist in the remote repository 💡 Need to bypass this check? Comment |
0e91b34 to
e2cc888
Compare
af9f0fb to
d0f42d3
Compare
| } from "./types"; | ||
| import { noopFunction } from "./utils"; | ||
|
|
||
| export class NoopCommandMetrics implements IOTelCommandMetrics { |
There was a problem hiding this comment.
Why do you need this object? Just to ensure that no commands will be actually send to a collector?
There was a problem hiding this comment.
Because we have multiple metric groups and don’t know what the user will opt into, I kept each group in its own class and defaulted them to noop. When the user initializes metrics, we swap in the real implementations based on config, so all checks happen once during OpenTelemetry.init(...). This keeps runtime code simple (no repeated conditional checks) and ensures disabled metrics have near-zero overhead
There was a problem hiding this comment.
So this is basically Null object pattern where in case if Otel is disabled empty methods are called?
packages/client/lib/RESP/types.ts
Outdated
| TRANSFORM_LEGACY_REPLY?: boolean; | ||
| transformReply: TransformReply | Record<RespVersions, TransformReply>; | ||
| unstableResp3?: boolean; | ||
| onSuccess?: (args: ReadonlyArray<RedisArgument>, reply: unknown, clientId: string) => void; |
There was a problem hiding this comment.
@nkaradzhov we might want to rename onSuccess to something more metrics/otel related
packages/client/lib/client/index.ts
Outdated
| if (command.onSuccess) { | ||
| command.onSuccess(parser.redisArgs, finalReply, this._self._clientId); |
There was a problem hiding this comment.
@nkaradzhov there might be a better way to record specific command related metrics that require the server response
| // Build the appropriate function based on options | ||
| if (options.hasIncludeCommands || options.hasExcludeCommands) { | ||
| // Version with filtering | ||
| this.createRecordOperationDuration = this.#createWithFiltering.bind(this); | ||
| } else { | ||
| this.createRecordOperationDuration = | ||
| this.#createWithoutFiltering.bind(this); | ||
| } | ||
| } |
There was a problem hiding this comment.
@nkaradzhov to make sure we don't do unnecessary checks after initiating the metrics singleton if the user hasn't provided any included or excluded commands
… in command and CSC metrics
bf97457 to
adbe066
Compare
adbe066 to
bc14732
Compare
| OTelMetrics.instance.connectionAdvancedMetrics.recordConnectionClosed( | ||
| CONNECTION_CLOSE_REASON.APPLICATION_CLOSE, | ||
| this.#clientId, | ||
| ); |
There was a problem hiding this comment.
Connection closed metric recorded without active connection
Low Severity
destroySocket unconditionally records a CONNECTION_CLOSE_REASON.APPLICATION_CLOSE metric even when no socket was ever created or connected (i.e., this.#socket is undefined). In contrast, #onSocketError properly guards with if (wasReady) before recording the close metric. This can inflate redis.client.connection.closed counts with spurious events when a client is destroyed before establishing a connection.


Description
TODO
Checklist
npm testpass with this change (including linting)?Note
Medium Risk
Touches core client execution paths (queueing, sockets, pubsub, cluster redirections, pooling, and client-side caching) to emit metrics, which could impact performance or lifecycle behavior if regressions slip in, though the instrumentation is gated behind explicit
OpenTelemetry.init()and metric-group toggles.Overview
Adds opt-in OpenTelemetry metrics instrumentation via a new exported
OpenTelemetry.init()singleton (throws on double init / missing@opentelemetry/api) backed by aClientRegistryused by observable gauges.Instruments core client flows to emit metrics for command durations (incl. MULTI/PIPELINE), connection lifecycle (create time, pending requests, wait time, close reasons, relaxed timeouts, handoffs), resiliency/errors (including maintenance notifications and cluster redirection retries), pub/sub in+out message counts, stream lag from
XREAD/XREADGROUP, and client-side caching hits/misses/evictions.Adds client identity propagation (
standalone/cluster/poolroles + parent IDs) so cluster nodes and pool members can attribute metrics correctly, updates lifecycle to register/unregister clients, and includes new docs + runnable example (docs/otel-metrics.md,examples/otel-metrics.js) plus OpenTelemetry dependency updates.Written by Cursor Bugbot for commit 32d63f2. This will update automatically on new commits. Configure here.