我正在尝试通过kibana
连接到es群集(1个主设备-1个数据)。
Kibana前端出现504
错误。
在我的kibana日志中没有错误。
但是在es:
[2019-02-22T11:39:33,764][WARN ][r.suppressed ] path: /.kibana/doc/config%3A6.4.2/_update, params: {refresh=wait_for, index=.kibana, id=config:6.4.2, type=doc}
org.elasticsearch.action.UnavailableShardsException: [.kibana][0] [1] shardIt, [0] active : Timeout waiting for [1m], request: indices:data/write/update
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction.retry(TransportInstanceSingleOperationAction.java:211) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction.doStart(TransportInstanceSingleOperationAction.java:166) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$2.onTimeout(TransportInstanceSingleOperationAction.java:232) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:573) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.2.jar:6.4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
[2019-02-22T11:39:33,768][WARN ][r.suppressed ] path: /.kibana/doc/config%3A6.4.2/_update, params: {refresh=wait_for, index=.kibana, id=config:6.4.2, type=doc}
org.elasticsearch.action.UnavailableShardsException: [.kibana][0] [1] shardIt, [0] active : Timeout waiting for [1m], request: indices:data/write/update
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction.retry(TransportInstanceSingleOperationAction.java:211) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction.doStart(TransportInstanceSingleOperationAction.java:166) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.action.support.single.instance.TransportInstanceSingleOperationAction$AsyncSingleAction$2.onTimeout(TransportInstanceSingleOperationAction.java:232) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:573) [elasticsearch-6.4.2.jar:6.4.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.4.2.jar:6.4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_191]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_191]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
我试图删除.kibana
索引并重新启动所有服务,但是没有运气。
curl -XGET http://master01-elastic:9200
{
"name" : "master01",
"cluster_name" : "local-stg-cluster",
"cluster_uuid" : "K3zb-E6xRle7MWjYrag4nA",
"version" : {
"number" : "6.4.2",
"build_flavor" : "default",
"build_type" : "rpm",
"build_hash" : "04711c2",
"build_date" : "2018-09-26T13:34:09.098244Z",
"build_snapshot" : false,
"lucene_version" : "7.4.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
curl -XGET http://master01-elastic/_cat/allocation?v
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
2 UNASSIGNED
curl -XGET http://master01-elastic.dev.encode.local:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .kibana K3NoYmDaRnGk9vem8oUFlQ 1 1
curl -XGET http://master01-elastic:9200/_cat/shards?v
index shard prirep state docs store ip node
.kibana 0 p UNASSIGNED
.kibana 0 r UNASSIGNED
curl -XGET 'http://master01-elastic.dev.encode.local:9200/_recovery?human&detailed=true&active_only=true'
{}
$ curl -XGET 'http://master01-elastic.dev.encode.local:9200/_cluster/allocation/explain'
{"index":".kibana","shard":0,"primary":true,"current_state":"unassigned","unassigned_info":{"reason":"INDEX_CREATED","at":"2019-02-22T16:36:40.852Z","last_allocation_status":"no_attempt"},"can_allocate":"no","allocate_explanation":"cannot allocate because allocation is not permitted to any of the nodes"}
答案 0 :(得分:2)
在解决此类问题(即未分配的分片)时,它首先可以通过运行以下命令来帮助了解节点恢复期间分配是否由于某种原因而失败:
curl -XGET 'localhost:9200/_recovery?human&detailed=true&active_only=true'
在您的情况下,响应为空,这意味着它不是恢复问题。
有时,如果分片分配失败太多,它将一直未分配,直到您运行以下命令:
curl -XPOST http://master01-elastic/_cluster/reroute?retry_failed=true
如果这样做没有帮助,则下一步是通过运行以下命令检查分配决策,以查看是否有任何突出之处:
curl -XGET http://master01-elastic/_cluster/allocation/explain
在您的情况下,将产生以下结果:
{
"index": ".kibana",
"shard": 0,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "INDEX_CREATED",
"at": "2019-02-22T16:36:40.852Z",
"last_allocation_status": "no_attempt"
},
"can_allocate": "no",
"allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes"
}
如果您的数据节点处于关闭状态,或者您具有某些群集级或索引级分片分配过滤规则(例如,防止将给定索引的分片分配给给定节点),可能会出现这种情况。您可以通过检查集群的设置和索引来查看是否是这种情况。
curl -XGET http://master01-elastic/.kibana/_settings
curl -XGET http://master01-elastic/_cluster/settings
检查index.routing.allocation.*
部分(对于index-level rules)中是否有东西...
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_name": "NODE1,NODE2"
},
"exclude": { <--- this might be the issue
"_name": "NODE3,NODE4"
}
}
},
...或在cluster.routing.allocation.*
部分(对于cluster-level rules)
"cluster": {
"routing": {
"allocation": {
"enable": "none" <--- this might be the issue
}
}
如果是这种情况,那么您可能必须调整规则。