Skip to content

Commit e6fa9ab

Browse files
Merge #1671
2 parents 3725a10 + 044de92 commit e6fa9ab

File tree

4 files changed

+287
-36
lines changed

4 files changed

+287
-36
lines changed

README.md

+41-27
Original file line numberDiff line numberDiff line change
@@ -5,38 +5,46 @@
55
[![Slack](https://img.shields.io/badge/JOIN-SLACK-blue)](https://kubernetes.slack.com/messages/openebs)
66
[![built with nix](https://builtwithnix.org/badge.svg)](https://builtwithnix.org)
77

8-
Table of contents:
9-
==================
8+
## Table of contents
9+
10+
---
11+
1012
- [Quickly deploy it on K8s and get started](https://mayastor.gitbook.io)
11-
- [Deploying on microk8s](/doc/microk8s.md)
13+
- [Deploying on microk8s](/doc/microk8s.md)
1214
- [High-level overview](#overview)
13-
- [The Nexus CAS module](#Nexus)
14-
- [Local storage](#local-storage)
15-
- [Exporting a Nexus](#exporting-the-nexus)
15+
- [The Nexus CAS module](#Nexus)
16+
- [Local storage](#local-storage)
17+
- [Exporting a Nexus](#exporting-the-nexus)
1618
- [Building from source](/doc/build.md)
1719
- [Examples of the Nexus module](/doc/mcli.md)
1820
- [Frequently asked questions](/doc/FAQ.md)
1921

2022
<p align="justify">
2123
<strong>Mayastor</strong> is a cloud-native declarative data plane written in <strong>Rust.</strong>
2224
Our goal is to abstract storage resources and their differences through the data plane such that users only need to
23-
supply the <strong>what</strong> and do not have to worry about the <strong>how</strong> so that individual teams stay in control.
25+
supply the <strong>what</strong> and do not have to worry about the <strong>how</strong>
26+
so that individual teams stay in control.
2427

2528
We also try to be as unopinionated as possible. What this means is that we try to work with the existing storage systems
26-
you might already have and unify them as abstract resources instead of swapping them out whenever the resources are local
27-
or remote.
29+
you might already have and unify them as abstract resources instead of swapping them out whenever the resources are local
30+
or remote.
2831
<br>
2932
<br>
33+
3034
</p>
3135

3236
Some targeted use cases are:
3337

34-
- Low latency workloads for converged and segregated storage by leveraging NVMe/NVMe over Fabrics (NVMe-oF)
35-
- Micro-VM based containers like [Firecracker microVMs](https://github.com/firecracker-microvm/firecracker) and [Kata Containers](https://github.com/kata-containers/kata-containers) by providing storage over vhost-user
36-
- Programmatic based storage access, i.e write to block devices from within your application instead of making system calls
37-
- Storage unification to lift barriers so that you can start deploying cloud native apps on your existing storage without painful data gravity barriers that prevent progress and innovation
38+
- Low latency workloads for converged and segregated storage by leveraging NVMe/NVMe over Fabrics (NVMe-oF)
39+
- Micro-VM based containers like [Firecracker microVMs](https://github.com/firecracker-microvm/firecracker) and
40+
[Kata Containers](https://github.com/kata-containers/kata-containers) by providing storage over vhost-user
41+
- Programmatic based storage access, i.e write to block devices from within your application instead of making system calls
42+
- Storage unification to lift barriers so that you can start deploying cloud native apps on your existing storage
43+
without painful data gravity barriers that prevent progress and innovation
3844

39-
# User Documentation
45+
## User Documentation
46+
47+
---
4048

4149
The official user documentation for the Mayastor Project is published at: [OpenEBS Replicated Storage](https://openebs.io/docs/concepts/data-engines/replicated-storage)
4250

@@ -46,15 +54,18 @@ At a high-level, Mayastor consists of two major components.
4654

4755
### **Control plane:**
4856

49-
* A microservices patterned control plane, centered around a core agent which publicly exposes a RESTful API. This is extended by a dedicated operator responsible
50-
for managing the life cycle of "Mayastor Pools" (an abstraction for devices supplying the cluster with persistent backing storage) and a CSI compliant external provisioner (controller).
51-
Source code for the control plane components is located in its [own repository](https://github.com/openebs/mayastor-control-plane)
57+
- A microservices patterned control plane, centered around a core agent which publically exposes a RESTful API.
58+
This is extended by a dedicated operator responsible for managing the life cycle of "Disk Pools"
59+
(an abstraction for devices supplying the cluster with persistent backing storage) and a CSI compliant
60+
external provisioner (controller).
61+
Source code for the control plane components is located in its [own repository](https://github.com/openebs/mayastor-control-plane)
5262

53-
* A _per_ node instance *mayastor-csi* plugin which implements the identity and node grpc services from CSI protocol.
63+
- A daemonset _mayastor-csi_ plugin which implements the identity and node grpc services from CSI protocol.
5464

5565
### **Data plane:**
5666

57-
* Each node you wish to use for storage or storage services will have to run a Mayastor daemon set. Mayastor itself has three major components: the Nexus, a local storage component, and the mayastor-csi plugin.
67+
- Each node you wish to use for storage or storage services will have to run an IO Engine daemonset. Mayastor itself has
68+
two major components: the Nexus and a local storage component.
5869

5970
## Nexus
6071

@@ -65,10 +76,10 @@ selected to run your k8s workload. We call these from the Nexus' point of view i
6576
The goal we envision the Nexus to provide here, as it sits between the storage systems and PVCs, is loose coupling.
6677

6778
A practical example: Once you are up and running with persistent workloads in a container, you need to move your data because
68-
the storage system that stores your PVC goes EOL. You now can control how this impacts your team without getting into storage
69-
migration projects, which are always painful and complicated. In reality, the individual storage volumes per team/app are
70-
relatively small, but today it is not possible for individual teams to handle their own storage needs. The Nexus provides the
71-
abstraction over the resources such that the developer teams stay in control.
79+
the storage system that stores your PVC goes EOL. You now can control how this impacts your team without getting into storage
80+
migration projects, which are always painful and complicated. In reality, the individual storage volumes per team/app are
81+
relatively small, but today it is not possible for individual teams to handle their own storage needs.
82+
The Nexus provides the abstraction over the resources such that the developer teams stay in control.
7283

7384
The reason we think this can work is because applications have changed, and the way they are built allows us to rethink
7485
they way we do things. Moreover, due to hardware [changes](https://searchstorage.techtarget.com/tip/NVMe-performance-challenges-expose-the-CPU-chokepoint)
@@ -79,6 +90,7 @@ a single device to a protocol standard protocol. These storage URIs are generate
7990
track of what resources belong to what Nexus instance and subsequently to what PVC.
8091

8192
You can also directly use the nexus from within your application code. For example:
93+
8294
</p>
8395

8496
```rust
@@ -125,7 +137,8 @@ buf.as_slice().into_iter().map(|b| assert_eq!(b, 0xff)).for_each(drop);
125137
We think this can help a lot of database projects as well, where they typically have all the smarts in their database engine
126138
and they want the most simple (but fast) storage device. For a more elaborate example see some of the tests in mayastor/tests.
127139

128-
To communicate with the children, the Nexus uses industry standard protocols. The Nexus supports direct access to local storage and remote storage using NVMe-oF TCP. Another advantage of the implementation is that if you were to remove
140+
To communicate with the children, the Nexus uses industry standard protocols. The Nexus supports direct access to local
141+
storage and remote storage using NVMe-oF TCP. Another advantage of the implementation is that if you were to remove
129142
the Nexus from the data path, you would still be able to access your data as if Mayastor was not there.
130143

131144
The Nexus itself does not store any data and in its most simplistic form the Nexus is a proxy towards real storage
@@ -151,6 +164,7 @@ except for the fact that it is not local anymore.
151164
Similarly, if you do not want to use anything other than local storage, you can still use Mayastor to provide you with
152165
additional functionality that otherwise would require you setup kernel specific features like LVM for example.
153166
<br>
167+
154168
</p>
155169

156170
## Exporting the Nexus
@@ -199,18 +213,18 @@ $ io-engine-client pool destroy tpool
199213

200214
## Links
201215

202-
- [Our bindings to spdk in the spdk-sys crate](https://github.com/openebs/spdk-sys)
216+
- [Our bindings to spdk in the spdk-rs crate](https://github.com/openebs/spdk-rs)
203217
- [Our vhost-user implementation](https://github.com/openebs/vhost-user)
204218

205219
## License
206220

207221
Mayastor is developed under Apache 2.0 license at the project level. Some components of the project are derived from
208222
other open source projects and are distributed under their respective licenses.
209223

210-
```http://www.apache.org/licenses/LICENSE-2.0```
224+
`http://www.apache.org/licenses/LICENSE-2.0`
211225

212226
### Contributions
213227

214228
Unless you explicitly state otherwise, any contribution intentionally submitted for
215229
inclusion in Mayastor by you, as defined in the Apache-2.0 license, licensed as above,
216-
without any additional terms or conditions.
230+
without any additional terms or conditions.

io-engine-tests/src/nexus.rs

+6-1
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,11 @@ impl NexusBuilder {
130130
self.with_bdev(&bdev)
131131
}
132132

133+
pub fn with_replicas(self, replicas: &[ReplicaBuilder]) -> Self {
134+
let cc = replicas.iter().map(|r| self.replica_uri(r)).collect();
135+
self.with_children(cc)
136+
}
137+
133138
pub fn with_local_replica(self, r: &ReplicaBuilder) -> Self {
134139
if r.rpc() != self.rpc() {
135140
panic!("Replica is not local");
@@ -152,7 +157,7 @@ impl NexusBuilder {
152157
self
153158
}
154159

155-
fn replica_uri(&self, r: &ReplicaBuilder) -> String {
160+
pub fn replica_uri(&self, r: &ReplicaBuilder) -> String {
156161
if r.rpc() == self.rpc() {
157162
r.bdev()
158163
} else {

io-engine/src/bdev/nexus/nexus_bdev_rebuild.rs

+18-8
Original file line numberDiff line numberDiff line change
@@ -87,15 +87,11 @@ impl<'n> Nexus<'n> {
8787
info!("{self:?}: start rebuild request for {child_uri}");
8888

8989
// Find a healthy child to rebuild from.
90-
let src_child_uri = match self
91-
.children_iter()
92-
.find(|c| c.is_healthy() && c.uri() != child_uri)
93-
{
94-
Some(child) => Ok(child.uri().to_owned()),
95-
None => Err(Error::NoRebuildSource {
90+
let Some(src_child_uri) = self.find_src_replica(child_uri) else {
91+
return Err(Error::NoRebuildSource {
9692
name: name.clone(),
97-
}),
98-
}?;
93+
});
94+
};
9995

10096
let dst_child_uri = match self.lookup_child(child_uri) {
10197
Some(c) if c.is_opened_unsync() => {
@@ -157,6 +153,20 @@ impl<'n> Nexus<'n> {
157153
})
158154
}
159155

156+
/// Finds the best suited source replica for the given destination.
157+
fn find_src_replica(&self, dst_uri: &str) -> Option<String> {
158+
let candidates: Vec<_> = self
159+
.children_iter()
160+
.filter(|c| c.is_healthy() && c.uri() != dst_uri)
161+
.collect();
162+
163+
candidates
164+
.iter()
165+
.find(|c| c.is_local().unwrap_or(false))
166+
.or_else(|| candidates.first())
167+
.map(|c| c.uri().to_owned())
168+
}
169+
160170
/// TODO
161171
async fn create_rebuild_job(
162172
&self,

0 commit comments

Comments
 (0)