切分算力 → MIG & MPS 浅谈

status

type

date

slug

summary

category

icon

password

切分算力大致可以分成三种方式 Multi-instance GPU(MIG), time-sharing 和 Multi-Process Service(MPS)

首先看看 MPS 根据 GKE 怎么做

GKE Google: Google K8s Engine

Kubernetes clusters have a set of management nodes called the control plane, which run system components such as the Kubernetes API server. In GKE, Google manages the control plane and system components for you. In Autopilot mode, which is the recommended way to run GKE, Google also manages your worker nodes. Google automatically upgrades component versions for improved stability and security, ensuring high availability, and ensuring integrity of data stored in the cluster's persistent storage.

Google Cloud CLI:

CLUSTER_NAME: the name of your new cluster.

COMPUTE_REGION: the Compute Engine region for your new cluster. For zonal clusters, specify -zone=COMPUTE_ZONE. The GPU type that you use must be available in the selected zone.

CLUSTER_VERSION: the GKE version for the cluster control plane and nodes. Use GKE version 1.27.7-gke.1088000 or later. Alternatively, specify a release channel with that GKE version by using the -release-channel=RELEASE_CHANNEL flag.

MACHINE_TYPE: the Compute Engine machine type for your nodes.

For H200 GPUs, use the A3 Ultra machine type
For H100 GPUs, use an A3 machine type other than Ultra (Mega, High, or Edge)
For A100 GPUs, use an A2 machine type
For L4 GPUs, use a G2 machine type
For all other GPUs, use an N1 machine type

GPU_TYPE: the GPU type, which must be an NVIDIA Tesla GPU platform such as nvidia-tesla-v100.

GPU_QUANTITY: the number of physical GPUs to attach to each node in the default node pool.

CLIENTS_PER_GPU: the maximum number of containers that can share each physical GPU.

DRIVER_VERSION: the NVIDIA driver version to install. Can be one of the following:

default: Install the default driver version for your GKE version.
latest: Install the latest available driver version for your GKE version. Available only for nodes that use Container-Optimized OS.
disabled: Skip automatic driver installation. You must manually install a driver after you create the node pool. If you omit gpu-driver-version, this is the default option.

通过创建 node pool 来 enable MPS

需要注意的是 CONTAINER_PER_GPU 为最大container数量 (最大为48)

GPU_QUANTITY 为node(整个nodepool)关联的实体GPU数量

创建完成

通过 kubectl describe nodes NODE_NAME 查看（nodepool 是node吗或是control plane?）

结果是

Capacity:

Allocatable:

一般两数相等为CONTAINER_PER_GPU * GPU count（GPU_QUANTITY）

Health 检查 kubectl logs -l k8s-app=nvidia-gpu-device-plugin -n kube-system --tail=100 | grep MPS

接下来是具体 deploy

在 gpu-mps.yaml 中

hostIPC: true enables Pods to talk to the MPS control daemon. It is required

-i = 5000 —> 5000 iterations?

然后 kubectl apply -f gpu-mps.yaml

kubectl get pods kubectl delete job --all

重要的CUDA 系统变量都可以在manifest中调通过

CUDA_MPS_ACTIVE_THREAD_PERCENTAGE → 默认为 100 / MaxSharedClientsPerGPU

所以每个pod 初始算力(线程 SM)就是按最大可划分pod数来?

CUDA_MPS_PINNED_DEVICE_MEM_LIMIT

→ 默认为 total mem（需要手动输入? ） / MaxSharedClientsPerGPU

如果我们deploy这个 image

他会先 cudaGetDeviceCount(&deviceCount)

我们再看如果不用 Google Cloud

参考:

https://cloud.google.com/kubernetes-engine/docs/how-to/nvidia-mps-gpus

https://docs.nvidia.com/datacenter/tesla/mig-user-guide/

https://medium.com/towards-data-science/how-to-increase-gpu-utilization-in-kubernetes-with-nvidia-mps-e680d20c3181