Question

我已在elasticsearch DB上传了大约1TB的数据。为了搜索，我尝试了以下方法 -

＆＃34;从+大小＆＃34; index.max_result_window的默认值为10000，但我想从100000搜索，因此我将index.max_result_window设置为100000.然后从100000和size = 10搜索，但它导致堆大小已满。
Scroll API - 为了使旧片段保持活动，使用更多文件句柄。因此它再次消耗节点中配置的内存。
search_after - 我尝试在_uid的基础上对文档进行排序，但它给了我一些错误 -

-

{
  "error": {
    "root_cause": [
      {
        "type": "circuit_breaking_exception",
        "reason": "[fielddata] Data too large, data for [_uid] would be    [13960098635/13gb], which is larger than the limit of [12027297792/11.2gb]",
        "bytes_wanted": 13960098635,
        "bytes_limit": 12027297792 
    }
  }
},

可以采取哪些措施来解决此错误，以及通过分页搜索大量数据的最有效方法是什么？

Answer 1

由于fielddata大小，你正在击中断路器。它大于堆的分配部分。

请参阅此处的Elasticsearch文档：https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#circuit-breaker

根据您的搜索要求，您可以考虑增加堆大小，您可以更改断路器限制，以便在您的方案中不会触发。可能最好的方法是限制fielddata缓存大小。

您可以通过将此设置添加到config/elasticsearch.yml文件，在fielddata上设置上限（相对或绝对）：

indices.fielddata.cache.size:  20%

有关详细信息，请参阅：https://www.elastic.co/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html#fielddata-size

现有答案：FIELDDATA Data is too large

使用＆＃34; search_after＆＃34;搜索1M数据在弹性搜索中

1 个答案: