Question

我在ES中有信息。映射非常简单：

{
    "index": {
        "aliases": {},
        "mappings": {
            "level1": {
                "properties": {
                    "id": {
                        "type": "string"
                    },
                    "level2": {
                        "type": "nested",
                        "properties": {
                            "level3": {
                                "type": "nested",
                                "properties": {
                                    "value1": {
                                        "type": "string"
                                    },
                                    "value2": {
                                        "type": "long"
                                    },
                                    "id": {
                                        "type": "string"
                                    },
                                    "value3": {
                                        "type": "long"
                                    }
                                }
                            },
                            "id": {
                                "type": "string"
                            }
                        }
                    }
                }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1505476515647",
                "number_of_shards": "5",
                "number_of_replicas": "1",
                "uuid": "_0IiQCPrQ1i-kDP1481y8w",
                "version": {
                    "created": "2030099"
                }
            }
        },
        "warmers": {}
    }
}

当我进行查询时：

 {"query": {"terms": {"_id": [ "value51" ] }}}

我收到了这种结构的数据：

_source (dict)
  level1 (list)
     level2 (list)
        data1 (dict)
              id
              value1
              value2
              value3
        data2 (dict)
        data3 (dict)
        ...
        data65000 (dict)

问题是65,000个数据太多，而且内存耗尽，我想知道_search或ElasticSearch是否有一些方法可以批量处理这些信息（data1，data2，data3 ......）。或者，如果有某种方法来进行该查询，以便我不会在计算机上耗尽内存。有什么想法吗？

谢谢！

Answer 1

您可以使用source filtering功能：只需配置字段列表，例如：

{
  "_source": {
    "includes": [
      "data1*"
    ]
  },
  "query": {
    "terms": {
      "_id": [
        "value51"
      ]
    }
  }
}

然后：

{
  "_source": {
    "includes": [
      "data2*"
    ]
  },
  "query": {
    "terms": {
      "_id": [
        "value51"
      ]
    }
  }
}

但由于多次查询，它可能会降低性能。

如果ElasticSearch处于多个级别，我怎样才能批量提供ElasticSearch的信息？

1 个答案: