Skip to content

Commit 2d0c76d

Browse files
docs: move older design docs into the git repo
Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
1 parent 473b640 commit 2d0c76d

10 files changed

+1412
-28
lines changed

doc/csi.md

+21-25
Original file line numberDiff line numberDiff line change
@@ -7,27 +7,25 @@ document.
77
Basic workflow starting from registration is as follows:
88

99
1. csi-node-driver-registrar retrieves information about csi plugin (mayastor) using csi identity service.
10-
1. csi-node-driver-registrar registers csi plugin with kubelet passing plugin's csi endpoint as parameter.
11-
1. kubelet uses csi identity and node services to retrieve information about the plugin (including plugin's ID string).
12-
1. kubelet creates a custom resource (CR) "csi node info" for the CSI plugin.
13-
1. kubelet issues requests to publish/unpublish and stage/unstage volume to the CSI plugin when mounting the volume.
10+
2. csi-node-driver-registrar registers csi plugin with kubelet passing plugin's csi endpoint as parameter.
11+
3. kubelet uses csi identity and node services to retrieve information about the plugin (including plugin's ID string).
12+
4. kubelet creates a custom resource (CR) "csi node info" for the CSI plugin.
13+
5. kubelet issues requests to publish/unpublish and stage/unstage volume to the CSI plugin when mounting the volume.
1414

1515
The registration of the storage nodes (i/o engines) with the control plane is handled
16-
by a gRPC service which is independent from the CSI plugin.
16+
by a gRPC service which is independent of the CSI plugin.
1717

1818
<br>
1919

2020
```mermaid
21-
graph LR;
22-
PublicApi["Public
23-
API"]
24-
CO["Container
25-
Orchestrator"]
21+
graph LR
22+
;
23+
PublicApi{"Public<br>API"}
24+
CO[["Container<br>Orchestrator"]]
2625
2726
subgraph "Mayastor Control-Plane"
2827
Rest["Rest"]
29-
InternalApi["Internal
30-
API"]
28+
InternalApi["Internal<br>API"]
3129
InternalServices["Agents"]
3230
end
3331
@@ -36,20 +34,18 @@ graph LR;
3634
end
3735
3836
subgraph "Mayastor CSI"
39-
Controller["Controller
40-
Plugin"]
41-
Node_1["Node
42-
Plugin"]
37+
Controller["Controller<br>Plugin"]
38+
Node_1["Node<br>Plugin"]
4339
end
4440
45-
%% Connections
46-
CO --> Node_1
47-
CO --> Controller
48-
Controller --> |REST/http| PublicApi
49-
PublicApi --> Rest
50-
Rest --> |gRPC| InternalApi
51-
InternalApi --> |gRPC| InternalServices
41+
%% Connections
42+
CO -.-> Node_1
43+
CO -.-> Controller
44+
Controller -->|REST/http| PublicApi
45+
PublicApi -.-> Rest
46+
Rest -->|gRPC| InternalApi
47+
InternalApi -.->|gRPC| InternalServices
5248
Node_1 <--> PublicApi
53-
Node_1 --> |NVMeOF| IO_Node_1
54-
IO_Node_1 <--> |gRPC| InternalServices
49+
Node_1 -.->|NVMe-oF| IO_Node_1
50+
IO_Node_1 <-->|gRPC| InternalServices
5551
```

doc/design/control-plane-behaviour.md

+171
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# Control Plane Behaviour
2+
3+
This document describes the types of behaviour that the control plane will exhibit under various situations. By
4+
providing a high-level view it is hoped that the reader will be able to more easily reason about the control plane. \
5+
<br>
6+
7+
## REST API Idempotency
8+
9+
Idempotency is a term used a lot but which is often misconstrued. The following definition is taken from
10+
the [Mozilla Glossary](https://developer.mozilla.org/en-US/docs/Glossary/Idempotent):
11+
12+
> An [HTTP](https://developer.mozilla.org/en-US/docs/Web/HTTP) method is **idempotent** if an identical request can be
13+
> made once or several times in a row with the same effect while leaving the server in the same state. In other words,
14+
> an idempotent method should not have any side-effects (except for keeping statistics). Implemented correctly, the `GET`,
15+
`HEAD`,`PUT`, and `DELETE` methods are idempotent, but not the `POST` method.
16+
> All [safe](https://developer.mozilla.org/en-US/docs/Glossary/Safe) methods are also ***idempotent***.
17+
18+
OK, so making multiple identical requests should produce the same result ***without side effects***. Great, so does the
19+
return value for each request have to be the same? The article goes on to say:
20+
21+
> To be idempotent, only the actual back-end state of the server is considered, the status code returned by each request
22+
> may differ: the first call of a `DELETE` will likely return a `200`, while successive ones will likely return a`404`.
23+
24+
The control plane will behave exactly as described above. If, for example, multiple `create volume` calls are made for
25+
the same volume, the first will return success (`HTTP 200` code) while subsequent calls will return a failure status
26+
code (`HTTP 409` code) indicating that the resource already exists. \
27+
<br>
28+
29+
## Handling Failures
30+
31+
There are various ways in which the control plane could fail to satisfy a `REST` request:
32+
33+
- Control plane dies in the middle of an operation.
34+
- Control plane fails to update the persistent store.
35+
- A gRPC request to Mayastor fails to complete successfully. \
36+
<br>
37+
38+
Regardless of the type of failure, the control plane has to decide what it should do:
39+
40+
1. Fail the operation back to the callee but leave any created resources alone.
41+
42+
2. Fail the operation back to the callee but destroy any created resources.
43+
44+
3. Act like kubernetes and keep retrying in the hope that it will eventually succeed. \
45+
<br>
46+
47+
Approach 3 is discounted. If we never responded to the callee it would eventually timeout and probably retry itself.
48+
This would likely present even more issues/complexity in the control plane.
49+
50+
So the decision becomes, should we destroy resources that have already been created as part of the operation? \
51+
<br>
52+
53+
### Keep Created Resources
54+
55+
Preventing the control plane from having to unwind operations is convenient as it keeps the implementation simple. A
56+
separate asynchronous process could then periodically scan for unused resources and destroy them.
57+
58+
There is a potential issue with the above described approach. If an operation fails, it would be reasonable to assume
59+
that the user would retry it. Is it possible for this subsequent request to fail as a result of the existing unused
60+
resources lingering (i.e. because they have not yet been destroyed)? If so, this would hamper any retry logic
61+
implemented in the upper layers.
62+
63+
### Destroy Created Resources
64+
65+
This is the optimal approach. For any given operation, failure results in newly created resources being destroyed. The
66+
responsibility lies with the control plane tracking which resources have been created and destroying them in the event
67+
of a failure.
68+
69+
However, what happens if destruction of a resource fails? It is possible for the control plane to retry the operation
70+
but at some point it will have to give up. In effect the control plane will do its best, but it cannot provide any
71+
guarantee. So does this mean that these resources are permanently leaked? Not necessarily. Like in
72+
the [Keep Created Resources](#keep-created-resources) section, there could be a separate process which destroys unused
73+
resources. \
74+
<br>
75+
76+
## Use of the Persistent Store
77+
78+
For a control plane to be effective it must maintain information about the system it is interacting with and take
79+
decision accordingly. An in-memory registry is used to store such information.
80+
81+
Because the registry is stored in memory, it is volatile - meaning all information is lost if the service is restarted.
82+
As a consequence critical information must be backed up to a highly available persistent store (for more detailed
83+
information see [persistent-store.md](./persistent-store.md)).
84+
85+
The types of data that need persisting broadly fall into 3 categories:
86+
87+
1. Desired state
88+
89+
2. Actual state
90+
91+
3. Control plane specific information \
92+
<br>
93+
94+
### Desired State
95+
96+
This is the declarative specification of a resource provided by the user. As an example, the user may request a new
97+
volume with the following requirements:
98+
99+
- Replica count of 3
100+
101+
- Size
102+
103+
- Preferred nodes
104+
105+
- Number of nexuses
106+
107+
Once the user has provided these constraints, the expectation is that the control plane should create a resource that
108+
meets the specification. How the control plane achieves this is of no concern.
109+
110+
So what happens if the control plane is unable to meet these requirements? The operation is failed. This prevents any
111+
ambiguity. If an operation succeeds, the requirements have been met and the user has exactly what they asked for. If the
112+
operation fails, the requirements couldn’t be met. In this case the control plane should provide an appropriate means of
113+
diagnosing the issue i.e. a log message.
114+
115+
What happens to resources created before the operation failed? This will be dependent on the chosen failure strategy
116+
outlined in [Handling Failures](#handling-failures).
117+
118+
### Actual State
119+
120+
This is the runtime state of the system as provided by Mayastor. Whenever this changes, the control plane must reconcile
121+
this state against the desired state to ensure that we are still meeting the users requirements. If not, the control
122+
plane will take action to try to rectify this.
123+
124+
Whenever a user makes a request for state information, it will be this state that is returned (Note: If necessary an API
125+
may be provided which returns the desired state also). \
126+
<br>
127+
128+
## Control Plane Information
129+
130+
This information is required to aid the control plane across restarts. It will be used to store the state of a resource
131+
independent of the desired or actual state.
132+
133+
The following sequence will be followed when creating a resource:
134+
135+
1. Add resource specification to the store with a state of “creating”
136+
137+
2. Create the resource
138+
139+
3. Mark the state of the resource as “complete”
140+
141+
If the control plane then crashes mid-operation, on restart it can query the state of each resource. Any resource not in
142+
the “complete” state can then be destroyed as they will be remnants of a failed operation. The expectation here will be
143+
that the user will reissue the operation if they wish to.
144+
145+
Likewise, deleting a resource will look like:
146+
147+
1. Mark resources as “deleting” in the store
148+
149+
2. Delete the resource
150+
151+
3. Remove the resource from the store.
152+
153+
For complex operations like creating a volume, all resources that make up the volume will be marked as “creating”. Only
154+
when all resources have been successfully created will their corresponding states be changed to “complete”. This will
155+
look something like:
156+
157+
1. Add volume specification to the store with a state of “creating”
158+
159+
2. Add nexus specifications to the store with a state of “creating”
160+
161+
3. Add replica specifications to the store with a state of “creating”
162+
163+
4. Create replicas
164+
165+
5. Create nexus
166+
167+
6. Mark replica states as “complete”
168+
169+
7. Mark nexus states as “complete”
170+
171+
8. Mark volume state as “complete”

0 commit comments

Comments
 (0)