ES节点在重读时会抛出堆栈跟踪&恢复,不清楚为什么?

时间:2017-06-03 02:15:56

标签: elasticsearch

我们有一个具有5个dn节点的2.3.3 ES集群,具有以下ES配置:

index.number_of_shards: 1
index.number_of_replicas: 4

其余几乎都是默认值。一切都很好,但是在重读时我们的几个索引会在ES日志中显示以下堆栈跟踪:

[2017-05-12 04:33:55,745][DEBUG][action.search            ] [qa13-ost-1020x-h-ds01] All shards failed for phase: [query_fetch]
RemoteTransportException[[qa13-ost-1020x-h-as01][192.168.104.110:9300][indices:data/read/search[phase/query+fetch]]]; nested: ShardNotFoundException[no such shard];
Caused by: [qa-hsbcuk1][[qa-hsbcuk1][0]] ShardNotFoundException[no such shard]
    at org.elasticsearch.index.IndexService.shardSafe(IndexService.java:197)
    at org.elasticsearch.search.SearchService.createContext(SearchService.java:639)
    at org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:620)
    at org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:463)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:392)
    at org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryFetchTransportHandler.messageReceived(SearchServiceTransportAction.java:389)
    at org.elasticsearch.transport.TransportRequestHandler.messageReceived(TransportRequestHandler.java:33)
    at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
    at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.doRun(MessageChannelHandler.java:300)
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
...

这些最终将503返回到我们的应用程序,该应用程序通过REST API调用ES。他们会简要介绍,之后碎片会恢复绿色。

在我们尝试调试时,我们已经注意到这些恢复,它们看起来与我们看到上述情况的相同时间段相对应。它们从服务器中的STORE开始,该服务器似乎具有主要分片:

 "qa-hsbcuk1" : {
    "shards" : [ {
      "id" : 0,
      "type" : "STORE",
      "stage" : "DONE",
      "primary" : true,
      "start_time" : "2017-05-12T08:33:55.817Z",
      "start_time_in_millis" : 1494578035817,
      "stop_time" : "2017-05-12T08:33:55.827Z",
      "stop_time_in_millis" : 1494578035827,
      "total_time" : "10ms",
      "total_time_in_millis" : 10,
      "source" : {
        "id" : "QZdQAM-oQ_e__vUeAzNOsw",
        "host" : "192.168.104.110",
        "transport_address" : "192.168.104.110:9300",
        "ip" : "192.168.104.110",
        "name" : "qa13-ost-1020x-h-as01"
      },
      "target" : {
        "id" : "QZdQAM-oQ_e__vUeAzNOsw",
        "host" : "192.168.104.110",
        "transport_address" : "192.168.104.110:9300",
        "ip" : "192.168.104.110",
        "name" : "qa13-ost-1020x-h-as01"
      },
      "index" : {
        "size" : {
          "total" : "0b",
          "total_in_bytes" : 0,
          "reused" : "0b",
          "reused_in_bytes" : 0,
          "recovered" : "0b",
          "recovered_in_bytes" : 0,
          "percent" : "0.0%"
        },
        "files" : {
          "total" : 0,
          "reused" : 0,
          "recovered" : 0,
          "percent" : "0.0%"
        },
        "total_time" : "0s",
        "total_time_in_millis" : 0,
        "source_throttle_time" : "-1",
        "source_throttle_time_in_millis" : 0,
        "target_throttle_time" : "-1",
        "target_throttle_time_in_millis" : 0
      },
      "translog" : {
        "recovered" : 0,
        "total" : 0,
        "percent" : "100.0%",
        "total_on_start" : 0,
        "total_time" : "9ms",
        "total_time_in_millis" : 9
      },
      "verify_index" : {
        "check_index_time" : "0s",
        "check_index_time_in_millis" : 0,
        "total_time" : "0s",
        "total_time_in_millis" : 0
      }:

其次是4个REPLICA:

  }, {
      "id" : 0,
      "type" : "REPLICA",
      "stage" : "DONE",
      "primary" : false,
      "start_time" : "2017-05-12T08:33:55.881Z",
      "start_time_in_millis" : 1494578035881,
      "stop_time" : "2017-05-12T08:33:55.925Z",
      "stop_time_in_millis" : 1494578035925,
      "total_time" : "43ms",
      "total_time_in_millis" : 43,
      "source" : {
        "id" : "QZdQAM-oQ_e__vUeAzNOsw",
        "host" : "192.168.104.110",
        "transport_address" : "192.168.104.110:9300",
        "ip" : "192.168.104.110",
        "name" : "qa13-ost-1020x-h-as01"
      },
      "target" : {
        "id" : "v25bTq0sQcadYs-ORzisJg",
        "host" : "192.168.104.109",
        "transport_address" : "192.168.104.109:9300",
        "ip" : "192.168.104.109",
        "name" : "qa13-ost-1020x-h-ds01"
      },
      "index" : {
        "size" : {
          "total" : "130b",
          "total_in_bytes" : 130,
          "reused" : "0b",
          "reused_in_bytes" : 0,
          "recovered" : "130b",
          "recovered_in_bytes" : 130,
          "percent" : "100.0%"
        },
        "files" : {
          "total" : 1,
          "reused" : 0,
          "recovered" : 1,
          "percent" : "100.0%"
        },
        "total_time" : "30ms",
        "total_time_in_millis" : 30,
        "source_throttle_time" : "0s",
        "source_throttle_time_in_millis" : 0,
        "target_throttle_time" : "-1",
        "target_throttle_time_in_millis" : 0
      },
      "translog" : {
        "recovered" : 0,
        "total" : 0,
        "percent" : "100.0%",
        "total_on_start" : 0,
        "total_time" : "9ms",
        "total_time_in_millis" : 9
      },
      "verify_index" : {
        "check_index_time" : "0s",
        "check_index_time_in_millis" : 0,
        "total_time" : "0s",
        "total_time_in_millis" : 0
      }
....

我们不清楚为什么会这样。

0 个答案:

没有答案