Difference between revisions of "CCU:Cluster Updates 2020 10"

From Collective Computational Unit
Jump to navigation Jump to search
m (Preview: changes behind the scenes, new nodes)
m (Early warning)
 
(4 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
The Kubernetes cluster will undergo a major hardware update in October.
 
The Kubernetes cluster will undergo a major hardware update in October.
  
'''TL;DR: a complete cluster reinstallation will be necessary due to major changes in the underlying network hardware. New persistent storage will be installed, and all persistent volumes will need to be deleted, as the drives will be integrated into the new system. Please start to backup everything and be prepared to delete all your pods and PVs on short notice.'''
+
'''TL;DR: a complete cluster reinstallation will be necessary due to major changes in the underlying network hardware. New persistent storage will be installed, and all persistent volumes will need to be deleted, as the drives will be integrated into the new system. Please start to backup everything and be prepared to terminate all your pods and PVs on short notice.'''
  
 
Some details in using the cluster after the reinstallation will change slightly. Some of them you can test already, please do so and help me find possible bugs before all changes go live. See below for more information about the major changes.
 
Some details in using the cluster after the reinstallation will change slightly. Some of them you can test already, please do so and help me find possible bugs before all changes go live. See below for more information about the major changes.
Line 9: Line 9:
 
== Preview: changes in persistent volumes ==
 
== Preview: changes in persistent volumes ==
  
 +
All local NVMe and SSD drives will be integrated into the ceph storage cluster. The nodes will not provide local PVs anymore, and there is only one cluster-wide global storage class. You will finally be able to mount PVs read/write on different pods across different nodes. Special read-only storage for shared datasets will be provided as before, but driven by ceph instead of an NFS export.
  
 
+
What you can do right now already to get rid of probably most of the data on your PVs is to move all static datasets to the system-wide storage, as described [[CCU:Global dataset storage|here]]. This framework will persist over the cluster reinstall and no data will be lost.
  
 
== Preview: changes in namespace scopes ==
 
== Preview: changes in namespace scopes ==
  
 +
Currently, each group (exc-cb, trr161 etc.) has their own namespace. To allow more fine-grained control over resource quotas, it is necessary to move to user-specific namespaces, which also provides more privacy and safety, and the possibility to install e.g. passwords as secrets in the private namespace.
  
 +
In the future, the group namespaces will be deleted, and everyone will have to work in the private namespace "your-username" instead. In fact, on the current live system, I have already installed these namespaces for testing. Please go ahead and check if everything works just as before, but in your own namespace instead of "exc-cb" or whatever you are using.
  
 +
Note: access to specific compute nodes will still be based on your membership in certain groups, see below.
  
 
== Preview: changes in permissions and resource use ==
 
== Preview: changes in permissions and resource use ==
  
 +
Too prevent hogging of too many resources at the same time, we will probably have to enforce user-specific resource quotas based on their group membership. Also, access to specific more or less powerful nodes based on group memberships will in the future be enforced. Exceptions can always be negotiated, for example in the final phase before a paper submission. It is likely that there also will be an allowed maximum to the lifetime of a pod.
  
 
== Preview: changes behind the scenes, new nodes ==
 
== Preview: changes behind the scenes, new nodes ==

Latest revision as of 10:36, 25 September 2020

Early warning

The Kubernetes cluster will undergo a major hardware update in October.

TL;DR: a complete cluster reinstallation will be necessary due to major changes in the underlying network hardware. New persistent storage will be installed, and all persistent volumes will need to be deleted, as the drives will be integrated into the new system. Please start to backup everything and be prepared to terminate all your pods and PVs on short notice.

Some details in using the cluster after the reinstallation will change slightly. Some of them you can test already, please do so and help me find possible bugs before all changes go live. See below for more information about the major changes.

Preview: changes in persistent volumes

All local NVMe and SSD drives will be integrated into the ceph storage cluster. The nodes will not provide local PVs anymore, and there is only one cluster-wide global storage class. You will finally be able to mount PVs read/write on different pods across different nodes. Special read-only storage for shared datasets will be provided as before, but driven by ceph instead of an NFS export.

What you can do right now already to get rid of probably most of the data on your PVs is to move all static datasets to the system-wide storage, as described here. This framework will persist over the cluster reinstall and no data will be lost.

Preview: changes in namespace scopes

Currently, each group (exc-cb, trr161 etc.) has their own namespace. To allow more fine-grained control over resource quotas, it is necessary to move to user-specific namespaces, which also provides more privacy and safety, and the possibility to install e.g. passwords as secrets in the private namespace.

In the future, the group namespaces will be deleted, and everyone will have to work in the private namespace "your-username" instead. In fact, on the current live system, I have already installed these namespaces for testing. Please go ahead and check if everything works just as before, but in your own namespace instead of "exc-cb" or whatever you are using.

Note: access to specific compute nodes will still be based on your membership in certain groups, see below.

Preview: changes in permissions and resource use

Too prevent hogging of too many resources at the same time, we will probably have to enforce user-specific resource quotas based on their group membership. Also, access to specific more or less powerful nodes based on group memberships will in the future be enforced. Exceptions can always be negotiated, for example in the final phase before a paper submission. It is likely that there also will be an allowed maximum to the lifetime of a pod.

Preview: changes behind the scenes, new nodes

  • New storage cluster (Ceph, 3 monitor and 3 OSD nodes, 230 TB NVMe in addition to what we already have)
  • New GPU server: nVidia DGX A100 (with 8x A100, 40 GB/GPU, NVLink)
  • New GPU server: Supermicro with 4x A100, 40 GB/GPU, NVLink
  • New backbone storage network: HDR Infiniband (200 GB/s)
  • New dedicated Ethernet network for Kubernetes