Skip to content

Commit 963b321

Browse files
mayastor-borstiagolobocastro
mayastor-bors
andcommitted
Merge #1816
1816: docs: add event-bus, switchover and integrations r=tiagolobocastro a=tiagolobocastro Co-authored-by: Tiago Castro <tiagolobocastro@gmail.com>
2 parents 5c9745b + 02372f4 commit 963b321

15 files changed

+786
-31
lines changed

.github/workflows/unit-int.yml

+7-4
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ jobs:
3131
- name: Setup Test Pre-Requisites
3232
run: |
3333
sudo sysctl -w vm.nr_hugepages=3584
34+
sudo debconf-communicate <<< "set man-db/auto-update false" || true
35+
sudo dpkg-reconfigure man-db || true
36+
ulimit -c unlimited
3437
sudo apt-get update
3538
sudo apt-get install linux-modules-extra-$(uname -r)
3639
for module in nvme_tcp nbd nvme_rdma; do
@@ -41,11 +44,11 @@ jobs:
4144
run: |
4245
echo "TEST_START_DATE=$(date +"%Y-%m-%d %H:%M:%S")" >> $GITHUB_ENV
4346
nix-shell --run "./scripts/cargo-test.sh"
47+
- name: Check Coredumps
48+
run: sudo ./scripts/check-coredumps.sh --since "${TEST_START_DATE}"
4449
- name: Cleanup
4550
if: always()
4651
run: nix-shell --run "./scripts/clean-cargo-tests.sh"
47-
- name: Check Coredumps
48-
run: sudo ./scripts/check-coredumps.sh --since "${TEST_START_DATE}"
4952
- name: Run JS Grpc Tests
5053
run: |
5154
echo "TEST_START_DATE=$(date +"%Y-%m-%d %H:%M:%S")" >> $GITHUB_ENV
@@ -72,11 +75,11 @@ jobs:
7275
with:
7376
name: ci-report-int
7477
path: ./ci-report/ci-report.tar.gz
78+
- name: Check Coredumps
79+
run: sudo ./scripts/check-coredumps.sh --since "${TEST_START_DATE}"
7580
- name: Cleanup
7681
if: always()
7782
run: nix-shell --run "./scripts/clean-cargo-tests.sh"
78-
- name: Check Coredumps
79-
run: sudo ./scripts/check-coredumps.sh --since "${TEST_START_DATE}"
8083
# debugging
8184
# - name: Setup tmate session
8285
# if: ${{ failure() }}

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,7 @@ $ io-engine-client pool destroy tpool
196196
- [Testing](./doc/contributor.md#testing)
197197
- [Building](./doc/build-all.md)
198198
- [CSI Workflow](./doc/csi.md)
199+
- [Design Docs](./doc/design/)
199200

200201
## Features
201202

doc/design/control-plane-behaviour.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ decision accordingly. An in-memory registry is used to store such information.
8080

8181
Because the registry is stored in memory, it is volatile - meaning all information is lost if the service is restarted.
8282
As a consequence critical information must be backed up to a highly available persistent store (for more detailed
83-
information see [persistent-store.md](./persistent-store.md)).
83+
information see [persistent-store.md](./control-plane.md#persistent-store-kvstore-for-configuration-data)).
8484

8585
The types of data that need persisting broadly fall into 3 categories:
8686

doc/design/control-plane.md

+125-24
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,56 @@ sequenceDiagram
214214
215215
<br>
216216

217+
### Internal Communication
218+
219+
<br>
220+
221+
```mermaid
222+
graph LR;
223+
subgraph Agents[" "]
224+
HACluster["HA Cluster Agent"]
225+
Core["Core Agent"]
226+
REST
227+
end
228+
229+
subgraph StorageNode["Storage Node"]
230+
subgraph DataPlane["I/O Engine"]
231+
Nexus
232+
Replicas
233+
Pools
234+
end
235+
end
236+
237+
subgraph AppNode["Application Node"]
238+
HANode["HA Node Agent"]
239+
CSINode["CSI Node Plugin"]
240+
end
241+
242+
REST["Rest Server"] -->|gRPC| Core
243+
Core <-->|gRPC| DataPlane
244+
HACluster <-->|gRPC| HANode
245+
HACluster -->|gRPC| Core
246+
HANode -.->|UNIX Socket| CSINode
247+
CSIController -->|HTTP/REST| REST
248+
```
249+
250+
<br>
251+
252+
As shown, there's many p2p connections between the different services.
253+
254+
The control-plane agents talk to each other via [gRPC] using well-defined APIs consisting of different service definitions.
255+
As we're using [gRPC], API is described in [protobuf] which is the default definition language for `gRPC`.
256+
257+
Here's a table containing all protobuf definitions for mayastor:
258+
259+
| API | Protobuf Definitions |
260+
|------------|--------------------------------------------------------------------------------------------------|
261+
| I/O Engine | <https://github.com/openebs/mayastor-dependencies/tree/HEAD/apis/io-engine/protobuf/v1> |
262+
| Agents | <https://github.com/openebs/mayastor-control-plane/tree/HEAD/control-plane/grpc/proto/v1> |
263+
| Events | <https://github.com/openebs/mayastor-dependencies/blob/HEAD/apis/events/protobuf/v1/event.proto> |
264+
265+
<br>
266+
217267
## Reconcilers
218268

219269
Reconcilers implement the logic that drives the desired state to the actual state. In principle it's the same model as the operator framework provided by K8s, however as mentioned, it's tailored towards storage rather than stateless containers.
@@ -334,9 +384,10 @@ The value add is not the ANA feature itself, rather what you do with it.
334384

335385
## NATS & Fault management
336386

337-
We used to use NATS as a message bus within mayastor as a whole, but as since switched for gRPC for p2p communications. \
338-
We will continue to use NATS for async notifications. Async in the sense that we send a message, but we do NOT wait for a reply. This mechanism does not
339-
do any form of "consensus," retries, and the likes. Information transported over NATS will typically be error telemetry that is used to diagnose problems. No work has started yet on this subject.
387+
We used to use [NATS] as a message bus within mayastor as a whole, but later switched to [gRPC] for p2p communications. \
388+
As of today, we use [NATS] for [events](./events.md) as async notifications via the [Publish/Subscribe Model](https://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern). Our current mechanism does not do any form of "consensus," retries, and the likes. Information transported over NATS will typically be error telemetry that is used to diagnose problems.
389+
390+
> NOTE: We only have 1 event subscriber at the moment, which is the call-home stats aggregator.
340391
341392
At a high level, error detectors are placed in code parts where makes sense; for example, consider the following:
342393

@@ -391,7 +442,7 @@ err.io.nvme.transport.* = {}
391442
err.io.nexus.* = {}
392443
```
393444

394-
Subscribes to these events will keep track of payloads and apply corrective actions. In its most simplistic form, it results in a model where one can
445+
Subscribers to these events will keep track of payloads and apply corrective actions. In its most simplistic form, it results in a model where one can
395446
define a per class for error an action that needs to be taken. This error handling can be applied to IO but also agents.
396447

397448
The content of the event can vary, containing some general metadata fields, as well as event specific information.
@@ -411,21 +462,22 @@ message EventMessage {
411462
}
412463
```
413464

465+
Read more about eventing [here](./events.md).
414466
An up to date API of the event format can be fetched
415-
[here](https://github.com/openebs/mayastor-dependencies/blob/develop/apis/events/protobuf/v1/event.proto).
467+
[here](https://github.com/openebs/mayastor-dependencies/blob/HEAD/apis/events/protobuf/v1/event.proto).
416468

417-
## Distributed Tracing
469+
## Tracing and Telemetry
418470

419471
Tracing means different things at different levels. In this case, we are referring to tracing component boundary tracing.
420472

421-
Tracing is by default implemented using open telemetry and, by default, we have provided a subscriber for jaeger. From jaeger, the information can be
422-
forwarded to, Elastic Search, Cassandra, Kafka, or whatever. In order to achieve full tracing support, all the gRPC requests and replies should add
423-
HTTP headers such that we can easily tie them together in whatever tooling is used. This is standard practice but requires a significant amount of work.
424-
The key reason is to ensure that all requests and responses pass along the headers, from REST to the scheduling pipeline.
473+
Tracing is implemented using open telemetry and, by default, we have provided a subscriber for [Jaeger]. From [Jaeger], the information can be
474+
forwarded to, Elastic Search, Cassandra, Kafka, etc. In order to achieve full tracing support, all the [gRPC] requests and replies should add
475+
`HTTP` headers such that we can easily tie them together in whatever tooling is used. This is standard practice but requires a significant amount of work.
476+
The key reason is to ensure that all requests and responses pass along the headers, from `REST` to the scheduling pipeline.
425477

426-
We also need to support several types of transport and serialization mechanisms. For example, HTTP/1.1 REST requests to HTTP/2 gRCP request to
478+
We also need to support several types of transport and serialization mechanisms. For example, `HTTP/1.1 REST` requests to `HTTP/2 gRPC` request to
427479
a KV store operation to etcd. For this, we will use [Tower]. \
428-
[Tower] provides a not-so-easy to use an abstraction of Request to Response mapping.
480+
[Tower] provides a not-so-easy to use interface/abstraction of request to response mapping.
429481

430482
```rust
431483
pub trait Service<Request> {
@@ -447,9 +499,9 @@ The provided services can then be layered with additional functions that add the
447499

448500
```rust
449501
pub trait Layer<S> {
450-
/// The service for which we want to insert a new layer
502+
/// The service for which we want to insert a new layer.
451503
type Service;
452-
/// the implementation of the layer itself
504+
/// the implementation of the layer itself.
453505
fn layer(&self, inner: S) -> Self::Service;
454506
}
455507
```
@@ -458,23 +510,72 @@ An example where a `REST` client sets the open tracing key/values on the request
458510

459511
```rust
460512
let layer = TraceLayer::new_for_http().make_span_with(|request: &Request<Body>| {
461-
tracing::debug_span!(
462-
"HTTP",
463-
http.method = %request.method(),
464-
http.url = %request.uri(),
465-
http.status_code = tracing::field::Empty,
466-
// otel is a mandatory key/value
467-
otel.name = %format!("HTTP {}", request.method()),
468-
otel.kind = %SpanKind::Client,
469-
otel.status_code = tracing::field::Empty,
470-
)
513+
tracing::debug_span!(
514+
"HTTP",
515+
http.method = %request.method(),
516+
http.url = %request.uri(),
517+
http.status_code = tracing::field::Empty,
518+
// otel is a mandatory key/value
519+
otel.name = %format!("HTTP {}", request.method()),
520+
otel.kind = %SpanKind::Client,
521+
otel.status_code = tracing::field::Empty,
522+
)
471523
})
472524
```
473525

526+
On the server-side we extract the trace id from the `HTTP` headers, and we inject it on the next call stack, which means it also gets eventually
527+
injected in the next transport hop. Specifically for `REST`, this means we inject it on the inter-service `gRPC`. Again, we use the same `Service`
528+
trait!
529+
530+
```rust
531+
impl tower::Service<TonicClientRequest> for OpenTelClientService<Channel> {
532+
...
533+
534+
fn call(&mut self, mut request: TonicClientRequest) -> Self::Future {
535+
let tracer = global::tracer("grpc-client");
536+
let context = tracing::Span::current().context();
537+
...
538+
539+
let context = context.with_span(span);
540+
global::get_text_map_propagator(|propagator| {
541+
propagator.inject_context(&context, &mut HeaderInjector(request.headers_mut()))
542+
});
543+
trace_http_service_call(&mut self.service, request, context)
544+
}
545+
}
546+
```
547+
548+
How do these traces get sent to [Jaeger] (or any other telemetry sink)? We setup an opentelemetry exporter on **every** service which is receiving and/or sending the tracing id's:
549+
550+
```rust
551+
opentelemetry_otlp::new_pipeline()
552+
.tracing()
553+
.with_exporter(
554+
opentelemetry_otlp::new_exporter().tonic().with_endpoint(endpoint)
555+
)
556+
.with_trace_config(
557+
sdktrace::Config::default().with_resource(Resource::new(tracing_tags))
558+
)
559+
.install_batch(opentelemetry_sdk::runtime::TokioCurrentThread)
560+
.expect("Should be able to initialise the exporter");
561+
```
562+
563+
<br>
564+
565+
Here's how a 2-replica volume creation can be traced via [Jaeger]:
566+
567+
![alt text](../img/jaeger.png)
568+
569+
> _NOTE_: In this example the client did not use tracing/telemetry which is why you can only see from the rest-server onwards
570+
474571
[MOAC]: https://github.com/openebs/moac
475572
[K8s]: https://kubernetes.io/
476573
[CSI]: https://github.com/container-storage-interface/spec
477574
[Mayastor]: ./mayastor.md
478575
[CAS]: https://openebs.io/docs/2.12.x/concepts/cas
479576
[Tower]: https://docs.rs/tower/latest/tower/
480577
[etcd]: https://etcd.io/
578+
[Jaeger]: https://www.jaegertracing.io/
579+
[NATS]: https://nats.io/
580+
[gRPC]: https://grpc.io/
581+
[protobuf]: https://protobuf.dev/

0 commit comments

Comments
 (0)