Question

我正在优化的一般用例是有关存储产品数据的文档，这些产品数据包含元数据和计算数据。

在某些情况下，我们的查询应返回原始产品数据（不需要元数据和计算出的数据以及不需要的数据）。但是在其他情况下，我们需要元数据/计算数据（可能带有或不带有原始产品数据）

元数据约占文档大小的一半。

将排除过滤器添加到查询的_source字段参数上不需要的字段时，可以明显改善性能。例如：

{
    ...
    "_source": {
        "excludes": [ "field_x",  "field_y", ....] <<<<< meta data fields
    },
    ...
}

我们考虑了是否值得通过在文档映射类型中添加排除过滤器来“排除” _source上的元/计算数据。即：

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#include-exclude

{
  "mappings": {
    "_doc": {
      "_source": {
        "excludes": [
          "field_x", <<<<<< exclude from the _source
          "field_y"  <<<<<< exclude from the _source
          .... POSSIBLY MANY MORE .....
        ]
      }
    }
  }
}

但是在某些情况下，我们确实需要field_x和field_y字段，因此我们可以使用“请求存储字段” API而不是带有过滤器的_source字段：

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-stored-fields.html#search-request-stored-fields

使用“ stored_fields” API将需要将这些字段明确标记为“排除_source”，但映射为“映射存储”

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-store.html

即

{
  "mappings": {
    "_doc": {
      "properties": {
        "field_x": {
          "type": "text", 
          "store": true <<<<<<<<< Store the fields
        },
        "field_y": {
          "field_y": "text", 
          "store": true <<<<<<<<< Store the fields
        },
        .... POSSIBLE MANY MORE "stored" fields .....
...
...
     ///// BUT EXCLUDE FROM SOURCE ///////////
     "_source": {
        "excludes": [
          "field_x", <<<<<< exclude from the _source
          "field_y"  <<<<<< exclude from the _source
          .... POSSIBLE MANY MORE .....
        ]
      }
}

在文档中，尚不明确权衡/优势是什么：

stored_fields参数与显式标记的字段有关如存储在映射中，默认情况下处于关闭状态，通常不会推荐的。请改用来源过滤，以选择要返回的原始原始文档。

对此的一般指导方针是什么？为什么要存储而不是过滤字段？显式查询存储字段，而不是从_source字段进行过滤/排除，是更快/更慢吗？

弹性搜索查询优化-_source过滤与查询stored_fields的性能优缺点是什么

0 个答案: