无法安排kubernetes pods请求nvidia.com/gpu

时间:2018-04-15 05:30:55

标签: kubernetes gpu nvidia

我已经能够让kubernetes在我的节点上识别我的gpus:

$ kubectl get node MY_NODE -o yaml
...
allocatable:
  cpu: "48"
  ephemeral-storage: "15098429006"
  hugepages-1Gi: "0" 
  hugepages-2Mi: "0"
   memory: 263756344Ki
  nvidia.com/gpu: "8"
  pods: "110"
capacity:
  cpu: "48"
  ephemeral-storage: 16382844Ki
  hugepages-1Gi: "0"
  hugepages-2Mi: "0"
  memory: 263858744Ki
  nvidia.com/gpu: "8"
  pods: "110"
...

然后我用

旋转一个吊舱
Limits:
  cpu:             2
  memory:          2147483648
  nvidia.com/gpu:  1
Requests:
  cpu:             500m
  memory:          536870912
  nvidia.com/gpu:  1

但是,pod仍处于PENDING状态:

Insufficient nvidia.com/gpu.

我是否正确指定了资源?

1 个答案:

答案 0 :(得分:0)

您是否在K8S中安装了NVIDIA插件?

kubectl create -f nvidia.io/device-plugin.yml

有些设备太旧,无法进行健康检查。因此必须禁用此选项:

containers:
      - image: nvidia/k8s-device-plugin:1.9
        name: nvidia-device-plugin-ctr
        env:
        - name: DP_DISABLE_HEALTHCHECKS
          value: "xids"

看看: