我有一个分散的Glusterfs卷,该卷由3x服务器上的3x砖组成。最近,其中一台服务器发生硬盘故障,并退出集群。我正在尝试替换集群中的这块积木,但我无法使其正常工作。
首先是版本信息:
$ glusterfsd --version
glusterfs 3.13.2
Repository revision: git://git.gluster.org/glusterfs.git
Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
它在Ubuntu 18.04上运行。
以下是现有信息:
Volume Name: vol01
Type: Disperse
Volume ID: 061cac4d-1165-4afe-87e0-27b213ea19dc
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srv02:/srv/glusterfs/vol01/brick <-- This is the brick that died
Brick2: srv03:/srv/glusterfs/vol01/brick
Brick3: srv04:/srv/glusterfs/vol01/brick
Options Reconfigured:
nfs.disable: on
transport.address-family: inet
我希望使用以下方法将srv02砖替换为srv05的砖:
gluster volume replace-brick vol01 srv02:/srv/glusterfs/vol01/brick srv05:/srv/glusterfs/vol01/brick commit force
但是,当我以根用户身份运行此命令时,出现此错误:
volume replace-brick: failed: Pre Validation failed on srv05. brick: srv02:/srv/glusterfs/vol01/brick does not exist in volume: vol01
据我所知,srv05已连接:
# gluster peer status
Number of Peers: 3
Hostname: srv04
Uuid: 5bbd6c69-e0a7-491c-b605-d70cb83ebc72
State: Peer in Cluster (Connected)
Hostname: srv02
Uuid: e4e856ba-61df-45eb-83bb-e2d2e799fc8d
State: Peer Rejected (Disconnected)
Hostname: srv05
Uuid: e7d098c1-7bbd-44e1-931f-034da645c6c6
State: Peer in Cluster (Connected)
您可以看到srv05已连接并且在群集中,srv02并未断开连接...
XFS分区上的所有块大小均相同。 srv05上的块为空。
我在做什么错?我希望不必转储整个FS并在可能的情况下对其进行重建...
编辑2019-01-01: 在这里按照本教程进行操作之后:https://support.rackspace.com/how-to/recover-from-a-failed-server-in-a-glusterfs-array/用新的旧砖替换旧的砖(srv02)。
集群可以识别服务器和实体:
# gluster volume status
Status of volume: vol01
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick srv02:/srv/glusterfs/vol01/brick N/A N/A N N/A
Brick srv03:/srv/glusterfs/vol01/brick 49152 0 Y 21984
Brick srv04:/srv/glusterfs/vol01/brick 49152 0 Y 16681
Self-heal Daemon on localhost N/A N/A Y 2582
Self-heal Daemon on srv04 N/A N/A Y 16703
Self-heal Daemon on srv03 N/A N/A Y 22006
但是,替代品SRV02上的积木没有上线!
经过大量搜索,我在新的srv02的砖块日志中找到了这个:
[2019-01-01 05:50:05.727791] E [MSGID: 138001] [index.c:2349:init] 0-vol01-index: Failed to find parent dir (/srv/glusterfs/vol01/brick/.glusterfs) of index basepath /srv/glusterfs/vol01/brick/.glusterfs/indices. [No such file or directory]
完全不知道如何解决这个问题,因为它是我希望在线上治愈的空白砖块!
答案 0 :(得分:1)
最后,我通过砖块目录中的以下命令使砖块联机:
# mkdir .glusterfs
# chmod 600 .glusterfs
# cd .glusterfs
# mkdir indices
# chmod 600 indices
# systemctl restart glusterd
砖块上线了,治愈过程开始于:
# gluster volume heal vol01 full
到目前为止,它似乎工作正常。