status
type
date
slug
summary
tags
category
icon
password
切分算力 大致可以分成三种方式 Multi-instance GPU(MIG), time-sharing 和 Multi-Process Service(MPS)
首先看看 MPS 根据 GKE 怎么做
GKE Google: Google K8s Engine
Kubernetes clusters have a set of management nodes called the control plane, which run system components such as the Kubernetes API server. In GKE, Google manages the control plane and system components for you. In Autopilot mode, which is the recommended way to run GKE, Google also manages your worker nodes. Google automatically upgrades component versions for improved stability and security, ensuring high availability, and ensuring integrity of data stored in the cluster's persistent storage.
Google Cloud CLI:
CLUSTER_NAME
: the name of your new cluster.
COMPUTE_REGION
: the Compute Engine region for your new cluster. For zonal clusters, specify-zone=
COMPUTE_ZONE
. The GPU type that you use must be available in the selected zone.
CLUSTER_VERSION
: the GKE version for the cluster control plane and nodes. Use GKE version 1.27.7-gke.1088000 or later. Alternatively, specify a release channel with that GKE version by using the-release-channel=
RELEASE_CHANNEL
flag.
MACHINE_TYPE
: the Compute Engine machine type for your nodes.- For H200 GPUs, use the A3 Ultra machine type
- For H100 GPUs, use an A3 machine type other than Ultra (Mega, High, or Edge)
- For A100 GPUs, use an A2 machine type
- For L4 GPUs, use a G2 machine type
- For all other GPUs, use an N1 machine type
GPU_TYPE
: the GPU type, which must be an NVIDIA Tesla GPU platform such asnvidia-tesla-v100
.
GPU_QUANTITY
: the number of physical GPUs to attach to each node in the default node pool.
CLIENTS_PER_GPU
: the maximum number of containers that can share each physical GPU.
DRIVER_VERSION
: the NVIDIA driver version to install. Can be one of the following:default
: Install the default driver version for your GKE version.latest
: Install the latest available driver version for your GKE version. Available only for nodes that use Container-Optimized OS.disabled
: Skip automatic driver installation. You must manually install a driver after you create the node pool. If you omitgpu-driver-version
, this is the default option.
通过创建 node pool 来 enable MPS
需要注意的是
CONTAINER_PER_GPU
为最大container数量 (最大为48)GPU_QUANTITY
为node(整个nodepool)关联的实体GPU数量 创建完成
通过
kubectl describe nodes
NODE_NAME
查看 (nodepool 是node吗 或是control plane?)结果是
Capacity:
Allocatable:
一般两数相等 为
CONTAINER_PER_GPU
* GPU count(GPU_QUANTITY
)
Health 检查
kubectl logs -l k8s-app=nvidia-gpu-device-plugin -n kube-system --tail=100 | grep MPS
接下来是具体 deploy
在
gpu-mps.yaml
中hostIPC: true
enables Pods to talk to the MPS control daemon. It is required-i = 5000 —> 5000 iterations?
然后
kubectl apply -f gpu-mps.yaml
kubectl get pods
kubectl delete job --all
重要的CUDA 系统变量 都可以在manifest中调 通过
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE
→ 默认为 100 / MaxSharedClientsPerGPU 所以每个pod 初始算力(线程 SM)就是 按最大可划分pod数来?
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
→ 默认为 total mem(需要手动输入? ) / MaxSharedClientsPerGPU
如果我们deploy这个 image
他会先
cudaGetDeviceCount(&deviceCount)
我们再看如果不用 Google Cloud
参考:
- Author:ran2323
- URL:https://www.blueif.me//article/19071a79-6e22-800e-bc3f-c2b979bf7924
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!