Difference between revisions of "CCU:Cluster Updates 2020 10"
Jump to navigation
Jump to search
(→Preview: changes in permissions and resource use) |
m (→Preview: changes behind the scenes, new nodes) |
||
| Line 22: | Line 22: | ||
== Preview: changes behind the scenes, new nodes == | == Preview: changes behind the scenes, new nodes == | ||
| − | * New storage cluster (Ceph, 3 monitor and 3 OSD nodes, 230 TB NVMe) | + | * New storage cluster (Ceph, 3 monitor and 3 OSD nodes, 230 TB NVMe in addition to what we have) |
* New GPU server: nVidia DGX A100 (with 8x A100, 40 GB/GPU, NVLink) | * New GPU server: nVidia DGX A100 (with 8x A100, 40 GB/GPU, NVLink) | ||
* New GPU server: Supermicro with 4x A100, 40 GB/GPU, NVLink | * New GPU server: Supermicro with 4x A100, 40 GB/GPU, NVLink | ||
* New backbone storage network: EDR Infiniband (100 GB/s) | * New backbone storage network: EDR Infiniband (100 GB/s) | ||
Revision as of 20:14, 24 September 2020
Contents
Early warning
The Kubernetes cluster will undergo a major hardware update in October.
TL;DR: a complete cluster reinstallation will be necessary due to major changes in the underlying network hardware. New persistent storage will be installed, and all persistent volumes will need to be deleted, as the drives will be integrated into the new system. Please start to backup everything and be prepared to delete all your pods and PVs on short notice.
Some details in using the cluster after the reinstallation will change slightly. Some of them you can test already, please do so and help me find possible bugs before all changes go live. See below for more information about the major changes.
Preview: changes in persistent volumes
Preview: changes in namespace scopes
Preview: changes in permissions and resource use
Preview: changes behind the scenes, new nodes
- New storage cluster (Ceph, 3 monitor and 3 OSD nodes, 230 TB NVMe in addition to what we have)
- New GPU server: nVidia DGX A100 (with 8x A100, 40 GB/GPU, NVLink)
- New GPU server: Supermicro with 4x A100, 40 GB/GPU, NVLink
- New backbone storage network: EDR Infiniband (100 GB/s)