Ceph版本:0.94.1
ceph -s
cluster 30266c5f-5e10-4027-936c-e4409667b409
health HEALTH_WARN
65 pgs stale
22 pgs stuck inactive
65 pgs stuck stale
22 pgs stuck unclean
monmap e7: 7 mons at {kvm1=10.136.8.129:6789/0,kvm2=10.136.8.130:6789/0,kvm3=10.136.8.131:6789/0,kvm4=10.136.8.132:6789/0,kvm5=10.136.8.133:6789/0,kvm6=10.136.8.134:6789/0,kvm7=10.136.8.135:6789/0}
election epoch 122, quorum 0,1,2,3,4,5,6 kvm1,kvm2,kvm3,kvm4,kvm5,kvm6,kvm7
osdmap e368: 14 osds: 14 up, 14 in
pgmap v1072573: 1128 pgs, 8 pools, 186 GB data, 51533 objects
630 GB used, 7330 GB / 8319 GB avail
1041 active+clean
65 stale+active+clean
22 creating
客户端io 361 kB / s rd,528 kB / s wr,48 op / s
ceph osd stat
osdmap e368: 14 osds: 14 up, 14 in
正如您所看到的,我有陈旧/不活跃/不洁的问题。我试着做
ceph pg 0.21 query
这就挂了。 (0.21是陈旧的pgs之一)。 Strace表明了这一点:
[pid 4850] futex(0x7f8cd8003984,FUTEX_WAKE_OP_PRIVATE,1,1,0x7f8cd8003980,
{FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1} <unfinished ...>
[pid 4855] <... sendmsg resumed> ) = 9
[pid 4850] <... futex resumed> ) = 1
[pid 4855] futex(0x7f8cd8026cd4, FUTEX_WAIT_PRIVATE, 19, NULL <unfinished ...>
[pid 4841] <... futex resumed> ) = 0
[pid 4850] futex(0x7f8cd801e2ac, FUTEX_WAIT_PRIVATE, 11, NULL <unfinished ...>
[pid 4841] futex(0x7f8cd8003900, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 4841] futex(0x7f8cd8003984, FUTEX_WAIT_PRIVATE, 39, NULL <unfinished ...>
[pid 4833] <... select resumed> ) = 0 (Timeout)
[pid 4833] select(0, NULL, NULL, NULL, {0, 4000}) = 0 (Timeout)
[pid 4833] select(0, NULL, NULL, NULL, {0, 8000}) = 0 (Timeout)
[pid 4833] select(0, NULL, NULL, NULL, {0, 16000}) = 0 (Timeout)
[pid 4833] select(0, NULL, NULL, NULL, {0, 32000}) = 0 (Timeout)
[pid 4833] select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
[pid 4833] select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
[pid 4833] select(0, NULL, NULL, NULL, {0, 50000}) = 0 (Timeout)
它没有带回信息。其他PG确实显示正确的JSON数据。 我试图重启osd0,但没有看到任何错误。
有人有什么想法吗?
答案 0 :(得分:1)
我发现了这个问题!在通过压榨规则移除后,没有OSD的游泳池。我不确定为什么创建PG并且规则只允许移动OSD,但这不重要。
删除所有空池后,我现在很好。
对于那些想要一个程序如何找到它的人:
首先:
ceph health detail
要查找哪个有问题,请:
ceph pg ls-by-pool
将pg与池匹配。然后使用以下命令删除池:
ceph osd pool delete <pool name> <pool name> --yes-i-really-really-mean-it
答案 1 :(得分:0)
您最有可能拥有的网络配置不允许某些OSD相互通信。您对pg 0.21 dump
的问题可能是同一个问题。
与与MON通信的大多数ceph
命令相反,pg 0.21 dump
会尝试communicate directly with the OSD that hosts the pg。
由于ceph osd stat
返回所有OSD均为up
和in
,这意味着MON和OSD之间的通信没有错误。