我正在本地kubernetes集群上部署用于FPGA的设备插件。本质上,它只是一个守护程序集,因此群集中的每个节点(不包括主节点)都将具有此部署的一个pod。
pod需要访问主机(节点)的设备树,它们还需要访问kubelet套接字。因此,我将两个特定的目录从主机安装到容器,如下所示:
containers:
- image: uofthprc/fpga-k8s-deviceplugin
name: fpga-device-plugin-ctr
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: device-info
mountPath: /sys/firmware/devicetree/base
readOnly: true
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: device-info
hostPath:
path: /sys/firmware/devicetree/base
由于某种原因,/var/lib/kubelet/device-plugins
可以很好地安装,并且可以从容器内完全访问,而/sys/firmware/devicetree/base
则不能!以下是附加到容器kubectl exec -it fpga-device-plugin-ds-hr6s5 -n device-plugins -- /bin/sh
之一的输出:
/work # ls /var/lib/kubelet/device-plugins
DEPRECATION kubelet.sock kubelet_internal_checkpoint
/work # ls /sys/firmware/devicetree/base
ls: /sys/firmware/devicetree/base: No such file or directory
/work # ls /sys/firmware/devicetree
ls: /sys/firmware/devicetree: No such file or directory
/work # ls /sys/firmware
/work # ls /sys
block bus class dev devices firmware fs kernel module power
/work #
我不确定为什么会发生这种情况,但是我使用“只读”权限,“读写”权限以及完全没有安装的方式对此进行了测试。在这三种情况下,容器中的路径/sys/firmware
内都没有任何内容。在主机上,我100%确定路径/sys/firmware/devicetree/base/
存在并且包含我想要的文件。
以下是其中一个容器上的describe pods
的输出:
Name: fpga-device-plugin-ds-hr6s5
Namespace: device-plugins
Priority: 2000001000
Priority Class Name: system-node-critical
Node: mpsoc2/10.84.31.12
Start Time: Wed, 20 May 2020 22:56:25 -0400
Labels: controller-revision-hash=cfbc8976f
name=fpga-device-plugin-ds
pod-template-generation=1
Annotations: cni.projectcalico.org/podIP: 10.84.32.223/32
cni.projectcalico.org/podIPs: 10.84.32.223/32
Status: Running
IP: 10.84.32.223
IPs:
IP: 10.84.32.223
Controlled By: DaemonSet/fpga-device-plugin-ds
Containers:
fpga-device-plugin-ctr:
Container ID: docker://629ab2fd7d05bc17e6f566912b127eec421f214123309c10674c40ed2839d1cf
Image: uofthprc/fpga-k8s-deviceplugin
Image ID: docker-pullable://uofthprc/fpga-k8s-deviceplugin@sha256:06f9e46470219d5cfb2e6233b1473e9f1a2d3b76c9fd2d7866f7a18685b60ea3
Port: <none>
Host Port: <none>
State: Running
Started: Wed, 20 May 2020 22:56:29 -0400
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/sys/firmware/devicetree/base from device-info (ro)
/var/lib/kubelet/device-plugins from device-plugin (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-dwbsm (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
device-plugin:
Type: HostPath (bare host directory volume)
Path: /var/lib/kubelet/device-plugins
HostPathType:
device-info:
Type: HostPath (bare host directory volume)
Path: /sys/firmware/devicetree/base
HostPathType:
default-token-dwbsm:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-dwbsm
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned device-plugins/fpga-device-plugin-ds-hr6s5 to mpsoc2
Normal Pulling 23s kubelet, mpsoc2 Pulling image "uofthprc/fpga-k8s-deviceplugin"
Normal Pulled 23s kubelet, mpsoc2 Successfully pulled image "uofthprc/fpga-k8s-deviceplugin"
Normal Created 23s kubelet, mpsoc2 Created container fpga-device-plugin-ctr
Normal Started 22s kubelet, mpsoc2 Started container fpga-device-plugin-ctr
据我所知,没有任何问题。
我正在为客户端和服务器使用kubernetes(已安装kubeadm,而不是microk8s)1.18.2版。有问题的节点是使用4.14.0内核的具有Ubuntu 16.04的ARM64节点。容器都是高山的:3.11,内部复制了一个简单的二进制文件。我不知道为什么安装架不起作用,任何帮助都将不胜感激。
/sys/firmware/devicetree/base/
在主机上的权限如下:
savi@mpsoc10:~$ ls -alh /sys/firmware/devicetree/base/
total 0
drwxr-xr-x 36 root root 0 May 20 21:32 .
drwxr-xr-x 3 root root 0 May 20 21:32 ..
-r--r--r-- 1 root root 4 May 20 21:32 #address-cells
drwxr-xr-x 2 root root 0 May 20 21:32 aliases
drwxr-xr-x 56 root root 0 May 20 21:32 amba
drwxr-xr-x 3 root root 0 May 20 21:32 amba_apu@0
drwxr-xr-x 2 root root 0 May 20 21:32 aux_ref_clk
-r--r--r-- 1 root root 15 May 20 21:32 board
drwxr-xr-x 2 root root 0 May 20 21:32 chosen
drwxr-xr-x 2 root root 0 May 20 21:32 clk
-r--r--r-- 1 root root 12 May 20 21:32 compatible
drwxr-xr-x 6 root root 0 May 20 21:32 cpu_opp_table
drwxr-xr-x 7 root root 0 May 20 21:32 cpus
drwxr-xr-x 2 root root 0 May 20 21:32 dcc
drwxr-xr-x 2 root root 0 May 20 21:32 dp_aclk
drwxr-xr-x 2 root root 0 May 20 21:32 edac
drwxr-xr-x 2 root root 0 May 20 21:32 fclk0
drwxr-xr-x 2 root root 0 May 20 21:32 fclk1
drwxr-xr-x 2 root root 0 May 20 21:32 fclk2
drwxr-xr-x 2 root root 0 May 20 21:32 fclk3
drwxr-xr-x 3 root root 0 May 20 21:32 firmware
drwxr-xr-x 2 root root 0 May 20 21:32 fpga-full
drwxr-xr-x 2 root root 0 May 20 21:32 gt_crx_ref_clk
drwxr-xr-x 2 root root 0 May 20 21:32 mailbox@ff990400
drwxr-xr-x 2 root root 0 May 20 21:32 memory
-r--r--r-- 1 root root 1 May 20 21:32 name
drwxr-xr-x 3 root root 0 May 20 21:32 nvmem_firmware
drwxr-xr-x 2 root root 0 May 20 21:32 pcap
drwxr-xr-x 2 root root 0 May 20 21:32 pmu
drwxr-xr-x 31 root root 0 May 20 21:32 power-domains
drwxr-xr-x 2 root root 0 May 20 21:32 psci
drwxr-xr-x 2 root root 0 May 20 21:32 pss_alt_ref_clk
drwxr-xr-x 2 root root 0 May 20 21:32 pss_ref_clk
drwxr-xr-x 2 root root 0 May 20 21:32 reset-controller
drwxr-xr-x 2 root root 0 May 20 21:32 sha384
-r--r--r-- 1 root root 4 May 20 21:32 #size-cells
drwxr-xr-x 2 root root 0 May 20 21:32 __symbols__
drwxr-xr-x 2 root root 0 May 20 21:32 timer
-r--r--r-- 1 root root 10 May 20 21:32 vendor
drwxr-xr-x 2 root root 0 May 20 21:32 video_clk
drwxr-xr-x 2 root root 0 May 20 21:32 zynqmp-power
drwxr-xr-x 2 root root 0 May 20 21:32 zynqmp_rsa
其中的某些文件是只读的,这促使我首先对卷安装使用只读权限。
以下是/sys
和/sys/firmware
在容器上的权限:
/work # ls -alh /sys/
total 4K
dr-xr-xr-x 12 root root 0 May 21 02:56 .
drwxr-xr-x 1 root root 4.0K May 21 02:56 ..
drwxr-xr-x 2 root root 0 May 21 03:08 block
drwxr-xr-x 32 root root 0 May 21 03:08 bus
drwxr-xr-x 64 root root 0 May 21 03:08 class
drwxr-xr-x 4 root root 0 May 21 03:08 dev
drwxr-xr-x 9 root root 0 May 21 03:08 devices
drwxrwxrwt 2 root root 40 May 21 02:56 firmware
drwxr-xr-x 10 root root 0 May 21 02:56 fs
drwxr-xr-x 7 root root 0 May 21 02:56 kernel
drwxr-xr-x 156 root root 0 May 21 03:08 module
drwxr-xr-x 2 root root 0 May 21 03:08 power
/work # ls -alh /sys/firmware/
total 0
drwxrwxrwt 2 root root 40 May 21 02:56 .
dr-xr-xr-x 12 root root 0 May 21 02:56 ..
mount | grep sysfs
在容器上的输出为:
/work # mount | grep sysfs
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
sysfs on /sys/firmware/devicetree/base type sysfs (ro,relatime)
答案 0 :(得分:1)
sysfs
是mounted as read-only:
mount | grep sysfs
sysfs on /sys type sysfs (ro,nosuid,nodev,noexec,relatime)
这就是为什么该体积未显示在窗格中的原因。您可以使用init container并将其作为特权来运行,以将其更改为可写卷。如果在没有privileged: true
的情况下运行,则访问权限将不会被修改,并且不会将卷挂载到Pod:
initContainers:
- name: mount
image: nginx:alpine
command: ["/bin/sh", "-c", "mount -o remount,rw '/sys'"]
securityContext:
privileged: true
这样,它将/sys
更改为可写:
mount | grep sysfs
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
sysfs on /sys/firmware/ type sysfs (rw,relatime)