我有一个小的Ceph集群。它的设置方式如下所述:
https://www.theo-andreou.org/?p=1750
在部署节点(托管ntp服务器的地方)上重启后,我得到:
ceph health; ceph osd tree
HEALTH_ERR 370 pgs are stuck inactive for more than 300 seconds; 370 pgs stale; 370 pgs stuck stale; too many PGs per OSD (307 > max 300)
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 10.88989 root default
-2 0.54449 host node02
0 0.54449 osd.0 down 0 1.00000
-3 0.54449 host node03
1 0.54449 osd.1 down 0 1.00000
-4 0.54449 host node04
2 0.54449 osd.2 down 0 1.00000
-5 0.54449 host node05
3 0.54449 osd.3 down 0 1.00000
-6 0.54449 host node06
4 0.54449 osd.4 down 0 1.00000
-7 0.54449 host node07
5 0.54449 osd.5 down 0 1.00000
-8 0.54449 host node08
6 0.54449 osd.6 down 0 1.00000
-9 0.54449 host node09
7 0.54449 osd.7 down 0 1.00000
-10 0.54449 host node10
8 0.54449 osd.8 down 0 1.00000
-11 0.54449 host node12
9 0.54449 osd.9 down 0 1.00000
-12 0.54449 host node13
10 0.54449 osd.10 down 0 1.00000
-13 0.54449 host node14
11 0.54449 osd.11 down 0 1.00000
-14 0.54449 host node16
12 0.54449 osd.12 down 0 1.00000
-15 0.54449 host node17
13 0.54449 osd.13 down 0 1.00000
-16 0.54449 host node18
14 0.54449 osd.14 down 0 1.00000
-17 0.54449 host node19
15 0.54449 osd.15 up 1.00000 1.00000
-18 0.54449 host node20
16 0.54449 osd.16 up 1.00000 1.00000
-19 0.54449 host node21
17 0.54449 osd.17 up 1.00000 1.00000
-20 0.54449 host node22
18 0.54449 osd.18 up 1.00000 1.00000
-21 0.54449 host node23
19 0.54449 osd.19 up 1.00000 1.00000
节点已启动并且ssh可访问。有没有办法让系统恢复健康?
答案 0 :(得分:1)
显然,OSD守护程序已关闭(即使在报告为“向上”的节点上)。运行I=0; for ID in {02..10} {12..14} {16..23}; do ceph-deploy osd activate node${ID}:/var/local/osd${I}; I=$((${I}+1)); done
后,我现在有 HEALTH_OK
非常感谢#ceph IRC频道!