Elasticsearch:“超时,等待所有节点处理发布状态”和集群不可用

时间:2018-07-05 08:22:12

标签: elasticsearch docker-compose

我正在使用docker设置Elasticsearch 3节点集群。这是我的docker compose文件:

version: '2.0'
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.3.0
    environment:
      - cluster.name=test-cluster
      - node.name=elastic_1
      - ES_JAVA_OPTS=-Xms512m -Xmx512m
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=2
      - discovery.zen.ping.unicast.hosts=elasticsearch,elasticsearch2,elasticsearch3
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - test_es_cluster_data:/usr/share/elasticsearch/data
    networks:
      - esnet
  elasticsearch2:
    extends:
      file: ./docker-compose.yml
      service: elasticsearch
    environment:
      - node.name=elastic_2
    volumes:
      - test_es_cluster2_data:/usr/share/elasticsearch/data
  elasticsearch3:
    extends:
      file: ./docker-compose.yml
      service: elasticsearch
    environment:
      - node.name=elastic_3
    volumes:
      - test_es_cluster3_data:/usr/share/elasticsearch/data
volumes:
  test_es_cluster_data:
  test_es_cluster2_data:
  test_es_cluster3_data:
networks:
  esnet:

一旦集群启动,我将杀死master(elastic_1)以测试故障转移。我希望将选出新的主服务器,而群集应该一直响应读请求。

好吧,主节点被选出,但是集群在相当长的时间内(〜45s)没有响应。

请在master停止后(docker stop escluster_elasticsearch_1)从elastic_2和elastic_3中查找日志:

弹性_2:

...
[2018-07-04T14:47:04,495][INFO ][o.e.d.z.ZenDiscovery     ] [elastic_2] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,509][WARN ][o.e.c.NodeConnectionsService] [elastic_2] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
    ...
[2018-07-04T14:47:07,565][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] detected_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])
[2018-07-04T14:47:35,301][WARN ][r.suppressed             ] path: /_cat/health, params: {}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
    ...
[2018-07-04T14:47:53,933][WARN ][o.e.c.s.ClusterApplierService] [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4]])] took [46.3s] above the warn threshold of 30s
[2018-07-04T14:47:53,934][INFO ][o.e.c.s.ClusterApplierService] [elastic_2] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5]])
[2018-07-04T14:47:56,931][WARN ][o.e.t.TransportService   ] [elastic_2] Received response for a request that has timed out, sent [48367ms] ago, timed out [18366ms] ago, action [internal:discovery/zen/fd/master_ping], node [{elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}], id [1035]

elastic_3:

[2018-07-04T14:47:04,494][INFO ][o.e.d.z.ZenDiscovery     ] [elastic_3] master_left [{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}], reason [shut_down]
...
[2018-07-04T14:47:04,519][WARN ][o.e.c.NodeConnectionsService] [elastic_3] failed to connect to node {elastic_1}{...}{172.24.0.3}{172.24.0.3:9300} (tried [1] times)
org.elasticsearch.transport.ConnectTransportException: [elastic_1][172.24.0.3:9300] connect_exception
    ...
[2018-07-04T14:47:07,550][INFO ][o.e.c.s.MasterService    ] [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}
[2018-07-04T14:47:35,026][WARN ][r.suppressed             ] path: /_cat/nodes, params: {v=}
org.elasticsearch.discovery.MasterNotDiscoveredException: null
    ...
[2018-07-04T14:47:37,560][WARN ][o.e.d.z.PublishClusterStateAction] [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}{...}{172.24.0.2}{172.24.0.2:9300}])
[2018-07-04T14:47:37,561][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] new_master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [4] source [zen-disco-elected-as-master ([1] nodes joined)[, ]]])
[2018-07-04T14:47:41,021][WARN ][o.e.c.s.MasterService    ] [elastic_3] cluster state update task [zen-disco-elected-as-master ([1] nodes joined)[, ]] took [33.4s] above the warn threshold of 30s
[2018-07-04T14:47:41,022][INFO ][o.e.c.s.MasterService    ] [elastic_3] zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected), reason: removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}
[2018-07-04T14:47:56,929][INFO ][o.e.c.s.ClusterApplierService] [elastic_3] removed {{elastic_1}{...}{172.24.0.3}{172.24.0.3:9300},}, reason: apply cluster state (from master [master {elastic_3}{...}{172.24.0.4}{172.24.0.4:9300} committed version [5] source [zen-disco-node-failed({elastic_1}{...}{172.24.0.3}{172.24.0.3:9300}), reason(transport disconnected)]])

为什么集群需要很长时间才能稳定并响应请求?

令人困惑的是,

a)选出新主人(elastic_3):

[2018-07-04T14:47:07,550][INFO ] ... [elastic_3] zen-disco-elected-as-master ([1] nodes joined)[, ], reason: new_master {elastic_3}...

b),然后被elastic_2检测到:

[2018-07-04T14:47:07,565][INFO ] ... [elastic_2] detected_master {elastic_3}...

c)然后,主服务器在等待处理已发布状态时超时:

[2018-07-04T14:47:37,560][WARN ] ... [elastic_3] timed out waiting for all nodes to process published state [4] (timeout [30s], pending nodes: [{elastic_2}...])

d)elastic_2通过警告应用集群状态:

[2018-07-04T14:47:53,933][WARN ] ... [elastic_2] cluster state applier task [apply cluster state (from master [master {elastic_3}...])] took [46.3s] above the warn threshold of 30s

什么原因导致超时(c)?所有这些都在本地计算机上运行(没有网络问题)。我是否缺少任何配置?

同时,同时请求elastic_2和elastic_3都将导致MasterNotDiscoveredException。根据文档,群集有望响应(https://www.elastic.co/guide/en/elasticsearch/reference/6.3/modules-discovery-zen.html#no-master-block)。

有人经历过吗?如果您对此问题有任何建议,将不胜感激。

1 个答案:

答案 0 :(得分:0)