Key Concepts on OpenTelemetry for Rust

Setting up OpenTelemetry with Rust is a relatively simple process, but through my research, the resources around it are minimal. In this blog, I’ll share my experience with instrumenting Rust with OpenTelemetry and explore some “gaps” in the OpenTelemetry documentation.

I have been writing Rust for just under a year. Jumping to it from primarily writing in JS and Golang was a no-brainer. The benefits of Rust make sense, but it comes with the caveat of having a steeper learning curve than other languages. I will not be delving into Rust and all its fantastic benefits as this blog aims to advocate the use of OpenTelemetry in Rust applications and not why you should switch to Rust. (You should switch to Rust anyway tho)

Most articles I’ve read (and there aren’t many) about Rust and OpenTelemetry have always focused on quickly instrumenting Rust applications and pointing them to a SaaS observability solution. The apps generally focus on showcasing the product features or how quickly you can send data to their systems. Still, they rarely delve into the crates, best practices and why you should do it.

Meeting up with other Rust developers at meetups/conferences, I’ve narrowed it down to a few possibilities.

Not knowing what OpenTelemetry is.
With Rust’s enhanced safety, speed and concurrency, do you need to instrument your code?

A few questioned whether adding the libraries to instrument their code was even worth it. With the majority responding with “Why should we?”.
Lack of documentation to follow.

OpenTelemetry documentation around Rust could be better. The documentation around instrumentation and adoption of OpenTelemetry flourishes for more mature languages. If we head over to https://opentelemetry.io/docs/ and open the instrumentation tab, there is so much documentation around Go, Java, and JavaScript, and it even covers Erlang/Elixer. Instead, for Rust, we’re met with a fun page….

As you can see, it's fantastic. In typical Rust fashion, here are the crates and go figure it out yourself.
No automatic instrumentation? Observability providers often package and provide libraries that instrument your code for you, making the process as easy as requiring the library, and we’re good to go for interpreted languages like JS, Java and Python. Compiled languages require much more effort, generally with you needing to specify where your instrumentation should be and what to capture.

The TL;DR of OpenTelemetry

OpenTelemetry (OTEL) is an Observability framework designed to capture telemetry data from your applications, such as metrics logs and traces. OTEL provides a vendor-agnostic set of APIs, SDKs, integrations and a collector service for your telemetry. OTEL returns ownership to the user as you control the telemetry you generate without being confined to a vendor's format or tool.

Let's look at key concepts of OpenTelemetry.

OpenTelemetry Signals

Traces:
- Describe the whole path a request takes
- Traces contain Spans
- Spans are the building blocks of traces
Metrics:
- Defined as a measurement of a service captured at runtime.
- A metric could be the duration of a system call, CPU or memory usage.
Logs:
- Timestamped text record
Baggage:
- The contextual information that's passed between spans
- OpenTelemetry Baggage can propagate contextual information, such as a user ID, across multiple spans between multiple services. This is useful when the contextual information is only accessible from one service, but you want it attached to every span in a trace.
- Take note that there are some caveats when using baggage. The context is stored within HTTP headers and would be exposed to anyone who can inspect your network packets.

OpenTelemetry Instrumentation:

Semantic Conventions: defines a standard naming convention for common telemetry types, such as application names, HTTP calls etc.
APIs: Defines the telemetry to be generated, metrics, logs and traces.
SDK: Language-specific implementations of the OpenTelmetry APIs
- Some SDKs can provide automatic instrumentation for common libraries and frameworks, for example, NodeJS.
OpenTelemetry Protocol (OTLP): Protocol used to send data to Obeservability backends
Cross Service Propagators: Propagators transfer data between different services and processes.
- Usually done through instrumentation libraries.
Resource Detectors: represents the entity producing telemetry as resource attributes

OpenTelemetry Collector:

Receivers: Defines how the collector ingests data: push or pull
Processors: Defines how data is processed before it is exported
Exporters: Defines how you send data to your Obeservability backend: push or pull

If you want to explore more, look at OpenTelemetry Concepts Documentation.

OpenTelemetry Support for Rust, what's available?

You have two main sources of information for Rust + OpenTelemetry. As mentioned before, the OTEL documentation only covers a little, so I primarily used the crates + resources available from GitHub.

There are many crates available for Rust + OpenTelemetry. I’ll only explore a few crates in this blog.

opentelemetry
opentelemetry-api
opentelemetry-sdk
opentelemetry-http
opentelemetry-otlp
- Jaegar now supports the OTLP protocol, so we can ship straight to a Jaeger.
tracing (not an OTEL crate)
tracing-opentelemetry

If you’ve been writing Rust for some time, you’ll be quite familiar with the tracing library. If you’re already using it, it's relatively straightforward to implement OpenTelemetry. The tracing-opentelemetry crate provides a straightforward way to create a subscriber that connects spans from multiple systems into a trace that emits them to an OTEL compatible tracing backend for visualisation.

Check out OpenTelemetry Rust GitHub for more crates.

Now you know what OpenTelemetry is and what's available. Why should you use it?

The shortest answer: there is no “perfect” application.

With that said, I'm not talking about simple applications or your typical ‘hello world’ here.

External factors, like network latency, CPU, memory or third-party APIs, can affect your application's performance. Your Rust app may be perfect, performant and error-free, but you can’t have 100% confidence that external factors can’t impact your application. OpenTelemetry instrumentation allows you to trace your application's requests and add context and metadata to your spans, showing you the entire flow of what is occurring.

Benefits of OpenTelemetry:

Improved performance: Adding tracing to your applications will highlight bottlenecks or areas of code that can be optimized
Faster debugging: Viewing trace data in an observability backend can help you identify potential bugs, race conditions, and issues caused by upstream/downstream services.
Monitoring: Tracing implementation is critical to ensuring your applications run stable in production by providing real-time insights into issues.
Deeper Visibility: Tracing can provide deep links across many microservices, highlighting connections between systems you didn’t know existed.
Standardised format to collect metrics, logs and traces
Ownership of where the instrumentation lies
Flexibility to include and exclude relevant metadata, traces, spans
Community support
Vendor-neutral

Summary

In this blog post, we explore the core concepts of OpenTelemetry and their benefits. Head to the next part of this blog series to see how we can implement them in Rust

Feel free to like, share and comment to help increase the visibility of OpenTelemetry for Rust!