我在限制Pods对群集中可用GPU的访问时遇到麻烦。
这是我的.yaml:
kind: Pod
metadata:
name: train-gpu
spec:
containers:
- name: train-gpu
image: index.docker.io/myprivaterepository/train:latest
command: ["sleep"]
args: ["100000"]
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
当我在此容器中运行nvidia-smi命令时,所有的GPU都会显示出来,而不仅仅是1个。
任何建议将不胜感激。
一些有用的信息:
Kubernetes version:
Client Version: version.Info{Major:“1”, Minor:“16”, GitVersion:“v1.16.1”, GitCommit:“d647ddbd755faf07169599a625faf302ffc34458”, GitTreeState:“clean”, BuildDate:“2019-10-07T14:30:40Z”, GoVersion:“go1.12.10”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: version.Info{Major:“1”, Minor:“15”, GitVersion:“v1.15.3”, GitCommit:“2d3c76f9091b6bec110a5e63777c332469e0cba2”, GitTreeState:“clean”, BuildDate:“2019-08-19T11:05:50Z”, GoVersion:“go1.12.9”, Compiler:“gc”, Platform:“linux/amd64”}
Docker base image:
FROM nvidia/cuda:10.1-base-ubuntu18.04