status
type
date
slug
summary
tags
category
icon
password
切分算力 大致可以分成三种方式 Multi-instance GPU(MIG), time-sharing 和 Multi-Process Service(MPS)
 
 
首先看看 MPS 根据 GKE 怎么做
GKE Google: Google K8s Engine
Kubernetes clusters have a set of management nodes called the control plane, which run system components such as the Kubernetes API server. In GKE, Google manages the control plane and system components for you. In Autopilot mode, which is the recommended way to run GKE, Google also manages your worker nodes. Google automatically upgrades component versions for improved stability and security, ensuring high availability, and ensuring integrity of data stored in the cluster's persistent storage.
 
Google Cloud CLI:
  • CLUSTER_NAME: the name of your new cluster.
  • COMPUTE_REGION: the Compute Engine region for your new cluster. For zonal clusters, specify -zone=COMPUTE_ZONE. The GPU type that you use must be available in the selected zone.
  • CLUSTER_VERSION: the GKE version for the cluster control plane and nodes. Use GKE version 1.27.7-gke.1088000 or later. Alternatively, specify a release channel with that GKE version by using the -release-channel=RELEASE_CHANNEL flag.
  • GPU_QUANTITY: the number of physical GPUs to attach to each node in the default node pool.
  • CLIENTS_PER_GPU: the maximum number of containers that can share each physical GPU.
  • DRIVER_VERSION: the NVIDIA driver version to install. Can be one of the following:
    • default: Install the default driver version for your GKE version.
    • latest: Install the latest available driver version for your GKE version. Available only for nodes that use Container-Optimized OS.
    • disabled: Skip automatic driver installation. You must manually install a driver after you create the node pool. If you omit gpu-driver-version, this is the default option.
 
通过创建 node pool 来 enable MPS
 
需要注意的是 CONTAINER_PER_GPU 为最大container数量 (最大为48)
GPU_QUANTITY 为node(整个nodepool)关联的实体GPU数量
 
创建完成
通过 kubectl describe nodes NODE_NAME 查看 (nodepool 是node吗 或是control plane?)
 
结果是
Capacity:
Allocatable:
 
一般两数相等 为CONTAINER_PER_GPU * GPU count(GPU_QUANTITY
Health 检查 kubectl logs -l k8s-app=nvidia-gpu-device-plugin -n kube-system --tail=100 | grep MPS
 
接下来是具体 deploy
gpu-mps.yaml
 
 
hostIPC: true enables Pods to talk to the MPS control daemon. It is required
-i = 5000 —> 5000 iterations?
然后 kubectl apply -f gpu-mps.yaml
kubectl get pods kubectl delete job --all
 
重要的CUDA 系统变量 都可以在manifest中调 通过
 
CUDA_MPS_ACTIVE_THREAD_PERCENTAGE → 默认为 100 / MaxSharedClientsPerGPU
所以每个pod 初始算力(线程 SM)就是 按最大可划分pod数来?
 
CUDA_MPS_PINNED_DEVICE_MEM_LIMIT
→ 默认为 total mem(需要手动输入? ) / MaxSharedClientsPerGPU
 
如果我们deploy这个 image
他会先 cudaGetDeviceCount(&deviceCount)
 
 
我们再看如果不用 Google Cloud
 
 
 
 
 
 
 
 
 
 
参考:
 
CUDA 学习记录 [1]通过SSH Tunneling 连接本地ollama和远程服务器
Loading...
ran2323
ran2323
忘掉名字吧
Latest posts
SFT + DPO 塔罗解读
2025-4-14
Backtracking
2025-4-14
Leetcode 0001-1000 分组
2025-4-14
mcp 记录(1)
2025-4-14
DPO 相关
2025-3-29
今日paper(3/25) - MAGPIE
2025-3-27
Announcement
 
 
 
 
暂时没有新的内容