如何处理etcdserver:不健康的群集

时间:2019-09-13 03:12:12

标签: kubernetes etcd

当我使用以下命令在etcd群集的主节点中添加节点时:

curl http://127.0.0.1:2379/v3beta/members \
-XPOST -H "Content-Type: application/json" \
-d '{"peerURLs": ["http://172.19.104.230:2380"]}'

它显示{"error":"etcdserver: unhealthy cluster","code":14}

然后我检查集群状态:

[root@iZuf63refzweg1d9dh94t8Z ~]# etcdctl member list
55a782166ce91d01, started, infra3, https://172.19.150.82:2380, https://172.19.150.82:2379
696a771758a889c4, started, infra1, https://172.19.104.231:2380, https://172.19.104.231:2379

很好。我应该怎么做才能使其正常工作?

1 个答案:

答案 0 :(得分:2)

根据etcd source code,如果returns方法失败,则会显示longestConnected ErrUnhealthy错误代码。

// longestConnected chooses the member with longest active-since-time.
// It returns false, if nothing is active.
func longestConnected(tp rafthttp.Transporter, membs []types.ID) (types.ID, bool) {
    var longest types.ID
    var oldest time.Time
    for _, id := range membs {
        tm := tp.ActiveSince(id)
        if tm.IsZero() { // inactive
            continue
        }

        if oldest.IsZero() { // first longest candidate
            oldest = tm
            longest = id
        }

        if tm.Before(oldest) {
            oldest = tm
            longest = id
        }
    }
    if uint64(longest) == 0 {
        return longest, false
    }
    return longest, true
}

因此,ectd找不到合适的成员进行连接。

集群的方法VotingMemberIDs返回投票成员的列表:

transferee, ok := longestConnected(s.r.transport, s.cluster.VotingMemberIDs())
if !ok {
    return ErrUnhealthy
}
// VotingMemberIDs returns the ID of voting members in cluster.
func (c *RaftCluster) VotingMemberIDs() []types.ID {
    c.Lock()
    defer c.Unlock()
    var ids []types.ID
    for _, m := range c.members {
        if !m.IsLearner {
            ids = append(ids, m.ID)
        }
    }
    sort.Sort(types.IDSlice(ids))
    return ids
}

我们从您的报告中可以看到,集群中有 个成员。

  
$ etcdctl member list
> 55a782166ce91d01, started, infra3, https://172.19.150.82:2380, https://172.19.150.82:2379
> 696a771758a889c4, started, infra1, https://172.19.104.231:2380, https://172.19.104.231:2379

因此,我们应该检查成员-他们是在投票成员,而不是learners,请参阅etcd docs | Learner

Raft learner

// RaftAttributes represents the raft related attributes of an etcd member.
type RaftAttributes struct {
    // PeerURLs is the list of peers in the raft cluster.
    // TODO(philips): ensure these are URLs
    PeerURLs []string `json:"peerURLs"`
    // IsLearner indicates if the member is raft learner.
    IsLearner bool `json:"isLearner,omitempty"`
}

因此,请尝试增加成员数以提供quorum etcd quorum

强制创建成员try this ETCD_FORCE_NEW_CLUSTER=“true"

法定人数

另请参阅此帖子:Understanding cluster and pool quorum