CCU:New GPU Cluster

From Collective Computational Unit
Revision as of 14:49, 13 January 2021 by Bastian.goldluecke (talk | contribs) (Running the first test container on the new cluster)
Jump to navigation Jump to search

Overview

In January, the old GPU cluster will gradually be dismantled and integrated into a new Kubernetes cluster. The reason is a massive hardware upgrades of the backbone infrastructure:

  • New Ceph-based storage cluster with currenly 210TB of NVMe storage to supply all compute nodes with data.
  • New network backbone: HDR infiniband (200 GB/s).
  • Triple-redundant servers to supply basic services and serve API requests, so that downtime should be minimized.
  • As a cherry on top, another GPU server with 4x A100.

Since we reinstall everything from scratch, the usage of the Cluster will also change slightly, both for easier access to storage (getting rid of the somewhat cumbersome need to allocate persistent volumes) and improved security (separate user namespaces).

We first provide a comprehensive list of changes in how to use the cluster, then give a detailed manual for how to move over your data and pods.

Pod configuration on the new cluster

User namespace, pod security and quotas

Each user works in their own namespace now, which is auto-generated when your login is created. The naming convention is "user-firstname-lastname", i.e. you replace all '.'s in your cluster username with '-'. Example: if your username is "test.account", your namespace will be "user-test-account".

Thus, you should set your default namespace in the kubeconfig accordingly, and perhaps have to update pod configurations. For security reasons, containers are forced to run with your own user id and a user group id of "10000". These will also be the ids used to create files and directories, and decide the permissions you have on the file system. The pod security policy which is active for your namespace will automatically fill in this data. Note that the security policy for pods is very restrictive for now to detect all problematic cases. In particular, you can not switch to root inside containers anymore. Please inform me if security policies disrupt your usual workflow so that we can work something out.

Finally, there is now a mechanism in place to set resource quotas for individual users. The preset is quite generous at the moment since we have plenty of resources, but if you believe your account is too limited, please contact me.

Persistent volume management (or lack thereof)

The ceph storage cluster provides a file system which is mounted on every node in the cluster. Pods are allowed to mount a subset of the filesystem as a host path, see the example pod below. The following directories can be mounted:

  • /abyss/home: this is your personal home directory which you can use any way you like.
  • /abyss/shared: a shared directory where every user has read/write access. It's a standard unix filesystem and everyone has an individual user id but is (for now) in the same user group. You can set the permission for files and directories you create accordingly to restrict or allow access. To not have total anarchy in this filesystem, please give sensible names and organize in subdirectories. For example, put personal files which you want to make accessible to everyone in "/abyss/shared/users/<your-namespace>". I will monitor how it works out and whether we need more rules here.
  • /abyss/datasets: directory for static datasets, mounted read-only. These are large general-interest datasets for which we only want to store one copy on the filesystem (no separate imagenets for everyone, please). So whenever you have a well-known public dataset in your shared directory which you think is useful to have in the static tree, please contact me and I move it to the read-only region.

Copy data from the old cluster into the new filesystem

The shared file system can be mounted as an nfs volume on the node "Vecna" on the old cluster, so you can create a pod on Vecna which mounts both the new filesystem as well as your PVs from the old cluster. Please use the following pod configuration as a template and add additional mounts for the PVs your want to copy over:


Afterwards, run a shell in the container and copy your stuff over to /abyss/shared/users/<your-namespace>. Make sure to set a group ownership id of 10000 with rw permissions for the group (rwx for directories) so you have read/write access on the new cluster. See commands below for an example how to do it.

Getting started on the new cluster

Login to the new cluster and update your kubeconfig

The frontend for the cluster and login services is located here:

https://ccu-k8s.inf.uni-konstanz.de/

Please follow instructions there to obtain credentials and cluster data for your kubeconfig.

Running the first test container on the new cluster

After login and adjusting the kubeconfig to the new cluster and user namespace, you should be able to start your first pod. The following example pod mounts the ceph filesystems into an Ubuntu container image.

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu-test-pod
spec:
  containers:
  - name: ubuntu
    image: ubuntu:20.04
    command: ["sleep", "1d"]
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 1
        memory: 1Gi
    volumeMounts:
      - mountPath: /abyss/home
        name: cephfs-home
        readOnly: false
      - mountPath: /abyss/shared
        name: cephfs-shared
        readOnly: false
      - mountPath: /abyss/datasets
        name: cephfs-datasets
        readOnly: true
  volumes:
    - name: cephfs-home
      hostPath:
        path: "/cephfs/abyss/home/user-bastian-goldluecke"
        type: Directory
    - name: cephfs-shared
      hostPath:
        path: "/cephfs/abyss/shared"
        type: Directory
    - name: cephfs-datasets
      hostPath:
        path: "/cephfs/abyss/datasets"
        type: Directory



Save this into a "test-pod.yaml", start the pod and verify that it has been created correcly and the filesystems have been mounted successfully, for example with the below commands. You can also obtain the numeric user- and group-id for filesystem permissions.

> kubectl apply -f test-pod.yaml
> kubectl get pods
> kubectl describe pod ubuntu-test-pod
> kubectl exec -it ubuntu-test-pod /bin/bash
$ ls /abyss/shared/<the directory you created for your data>
$ id
uid=10000 gid=10000 groups=10000

Moving your workloads to the new cluster