我的K8S集群中有一个节点,用于监视工具。
此处运行的窗格:Grafana
,PGAdmin
,Prometheus
和kube-state-metrics
我的问题是我有很多被驱逐的豆荚
被驱逐的豆荚:kube-state-metrics
,grafana-core
,pgadmin
然后,豆荚因以下原因而被驱逐:The node was low on resource: [DiskPressure].
:kube-state-metrics
(被驱逐的豆荚的90%),pgadmin
(被驱逐的豆荚的20%)
当我检查任何一个吊舱时,我的磁盘上都有可用空间:
bash-5.0$ df -h
Filesystem Size Used Available Use% Mounted on
overlay 7.4G 3.3G 3.7G 47% /
tmpfs 64.0M 0 64.0M 0% /dev
tmpfs 484.2M 0 484.2M 0% /sys/fs/cgroup
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /dev/termination-log
shm 64.0M 0 64.0M 0% /dev/shm
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /etc/resolv.conf
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /etc/hostname
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /etc/hosts
/dev/nvme2n1 975.9M 8.8M 951.1M 1% /var/lib/grafana
/dev/nvme0n1p2 7.4G 3.3G 3.7G 47% /etc/grafana/provisioning/datasources
tmpfs 484.2M 12.0K 484.2M 0% /run/secrets/kubernetes.io/serviceaccount
tmpfs 484.2M 0 484.2M 0% /proc/acpi
tmpfs 64.0M 0 64.0M 0% /proc/kcore
tmpfs 64.0M 0 64.0M 0% /proc/keys
tmpfs 64.0M 0 64.0M 0% /proc/timer_list
tmpfs 64.0M 0 64.0M 0% /proc/sched_debug
tmpfs 484.2M 0 484.2M 0% /sys/firmware
只有一个或两个豆荚显示另一条消息:
The node was low on resource: ephemeral-storage. Container addon-resizer was using 48Ki, which exceeds its request of 0. Container kube-state-metrics was using 44Ki, which exceeds its request of 0.
The node was low on resource: ephemeral-storage. Container pgadmin was using 3432Ki, which exceeds its request of 0.
我也有kubelet说:
(combined from similar events): failed to garbage collect required amount of images. Wanted to free 753073356 bytes, but freed 0 bytes
我让那些Pod运行在AWS t3.micro
它似乎并没有影响我在生产中的服务。
为什么会这样,我应该如何解决。
编辑:这是我在节点中执行df -h
时的结果
admin@ip-172-20-41-112:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 3.9G 0 3.9G 0% /dev
tmpfs 789M 3.0M 786M 1% /run
/dev/nvme0n1p2 7.5G 6.3G 804M 89% /
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
我可以看到/dev/nvme0n1p2
,但是如何看到内容呢?在/中执行ncdu时,我只能看到3GB的数据...
答案 0 :(得分:2)
显然,您的节点上的可用磁盘空间即将用完 。但是请记住,根据documentation DiskPressure
条件表示:
任一节点的根文件系统上可用的磁盘空间和索引节点 或图像文件系统已满足驱逐阈值
尝试在您的工作人员df -h
上而不是在node
上运行Pod
。磁盘使用率是多少?此外,您可以查看小玩意日志以了解更多详细信息:
journalctl -xeu kubelet.service
让我知道是否有帮助。
Here,您会找到一个很好的解释同一主题的答案。
此行清楚地表明默认阈值已接近被超出:
/dev/nvme0n1p2 7.5G 6.3G 804M 89% /
与根用户(su -
)切换并运行:
du -hd1 /
查看哪些目录占用了大部分磁盘空间。