我正在一个ElasticSearch(6.2)项目中,其中index
有许多keyword
字段,并且使用lowercase
过滤器对其进行了规范化,以执行不区分大小写的搜索。搜索效果很好,并且返回归一化字段的实际值(不是小写)。但是,聚合不返回字段的实际值(返回小写)。
以下示例摘自ElasticSearch文档。
https://www.elastic.co/guide/en/elasticsearch/reference/master/normalizer.html
创建索引:
PUT index { "settings": { "analysis": { "normalizer": { "my_normalizer": { "type": "custom", "char_filter": [], "filter": ["lowercase", "asciifolding"] } } } }, "mappings": { "_doc": { "properties": { "foo": { "type": "keyword", "normalizer": "my_normalizer" } } } } }
插入文档:
PUT index/_doc/1 { "foo": "Bar" } PUT index/_doc/2 { "foo": "Baz" }
通过聚合搜索:
GET index/_search { "size": 0, "aggs": { "foo_terms": { "terms": { "field": "foo" } } } }
结果:
{ "took": 43, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped" : 0, "failed": 0 }, "hits": { "total": 3, "max_score": 0.0, "hits": { "total": 2, "max_score": 0.47000363, "hits": [ { "_index": "index", "_type": "_doc", "_id": "1", "_score": 0.47000363, "_source": { "foo": "Bar" } }, { "_index": "index", "_type": "_doc", "_id": "2", "_score": 0.47000363, "_source": { "foo": "Baz" } } ] } }, "aggregations": { "foo_terms": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "bar", "doc_count": 1 }, { "key": "baz", "doc_count": 1 } ] } } }
如果检查聚合,您将看到已返回小写值。例如"key": "bar"
。
有什么方法可以更改汇总以返回实际值?
例如"key": "Bar"
答案 0 :(得分:1)
如果您要进行不区分大小写的搜索,但要在聚合中返回精确值,则不需要任何规范化器。您可以简单地使用一个text
子字段来创建一个keyword
字段(该字段将标记小写,并且默认情况下允许不区分大小写的搜索)。您可以将前者用于搜索,将后者用于聚合。它是这样的:
PUT index
{
"mappings": {
"_doc": {
"properties": {
"foo": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
为两个文档建立索引后,您可以在terms
上发布foo.keyword
聚合:
GET index/_search
{
"size": 2,
"aggs": {
"foo_terms": {
"terms": {
"field": "foo.keyword"
}
}
}
}
结果将如下所示:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "index",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"foo": "Baz"
}
},
{
"_index": "index",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"foo": "Bar"
}
}
]
},
"aggregations": {
"foo_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Bar",
"doc_count": 1
},
{
"key": "Baz",
"doc_count": 1
}
]
}
}
}