Question

我有问题，我从elasticsearch 2.x更新到5.1。但是，我的一些数据在较新的弹性搜索中不起作用，因为“在默认情况下，文本字段禁用了字段数据”https://www.elastic.co/guide/en/elasticsearch/reference/5.1/fielddata.html，之后才启用它。{/ p>

有没有办法自动将fielddata启用到文本字段？

我尝试过这样的代码

curl -XPUT http://localhost:9200/_template/template_1 -d '
{
  "template": "*",
  "mappings": {
    "_default_": {
      "properties": {
        "fielddata-*": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}'

但看起来elasticsearch在字段名称中不理解通配符。对此的临时解决方案是我每30分钟运行一次python脚本，扫描所有索引并将fielddata = true添加到新的字段中。

问题是我在elasticsearch中有像“这很酷”的字符串数据。

curl -XPUT 'http://localhost:9200/example/exampleworking/1' -d '
{
    "myfield": "this is cool"
}'

尝试聚合时：

curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
    "aggs": {
        "foobar": {
            "terms": {
                "field": "myfield"
            }
        }
    }   
}'

默认情况下，文本字段禁用Fielddata。在[myfield]上设置fielddata = true“

弹性搜索文档建议使用.keyword而不是添加fielddata。但是，这并不是我想要的数据。

curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
    "aggs": {
        "foobar": {
            "terms": {
                "field": "myfield.keyword"
            }
        }
    }   
}'

返回：

  "buckets" : [
    {
      "key" : "this is cool",
      "doc_count" : 1
    }
  ]

这是不正确的。然后我添加了fielddata true，一切正常：

curl -XPUT 'http://localhost:9200/example/_mapping/exampleworking' -d '
{
  "properties": {
        "myfield": {
            "type": "text",
            "fielddata": true
        }
    }
}'

然后汇总

curl 'http://localhost:9200/example/_search?pretty=true' -d '
{
    "aggs": {
        "foobar": {
            "terms": {
                "field": "myfield"
            }
        }
    }   
}'

返回正确的结果

  "buckets" : [
    {
      "key" : "cool",
      "doc_count" : 1
    },
    {
      "key" : "is",
      "doc_count" : 1
    },
    {
      "key" : "this",
      "doc_count" : 1
    }
  ]

如何自动将此fielddata = true添加到所有文本字段的所有索引？这甚至可能吗？在elasticsearch 2.x中，这是开箱即用的。

Answer 1

我会回答自己

texture

这是我想要的。现在所有索引都有默认设置fielddata true

Answer 2

添加 "fielddata": true 允许聚合文本字段，但这在规模上存在性能问题。更好的解决方案是使用多字段映射。

不幸的是，这在 Elasticsearch 的文档中隐藏得有点深，在 fielddata 映射参数下的警告中：https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html#before-enabling-fielddata

以下是一个完整的示例，说明这对术语聚合有何帮助，截至 2021 年 4 月 24 日在 Elasticsearch 7.12 上进行了测试：

映射（在 ES7 中，在“放置索引模板”请求等主体的 mappings 属性下）：

{
    "properties": {
        "bio": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword"
                }
            }
        }
    }
}

索引的四个文档：

{
    "bio": "Dogs are the best pet."
}

{
    "bio": "Cats are cute."
}

{
    "bio": "Cats are cute."
}

{
    "bio": "Cats are the greatest."
}

聚合查询：

{
    "size": 0,
    "aggs": {
        "bios_with_cats": {
            "filter": {
                "match": {
                    "bio": "cats"
                }
            },
            "aggs": {
                "bios": {
                    "terms": {
                        "field": "bio.keyword"
                    }
                }
            }
        }
    }
}

聚合查询结果：

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 4,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "bios_with_cats": {
      "doc_count": 3,
      "bios": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "Cats are cute.",
            "doc_count": 2
          },
          {
            "key": "Cats are the greatest.",
            "doc_count": 1
          }
        ]
      }
    }
  }
}

基本上，这种聚合表示“在简历类似于‘猫’的文件中，每个不同的简历有多少？”排除 bio 属性中没有“cats”的一个文档，然后将剩余的文档分组到桶中，其中一个有一个文档，另一个有两个文档。

默认情况下，在elasticsearch

2 个答案: