我有一个汇总查询,可以将哪个存储桶设为国家/地区的城市名称。查询(我在意义上说)如下:
GET test/_search
{
"query" : {
"bool" : {
"must" : {
"match" : {
"name.autocomplete" : {
"query" : "new yo",
"type" : "boolean"
}
}
},
"must_not" : {
"term" : {
"source" : "old"
}
}
}
},
"aggregations" : {
"city_name" : {
"terms" : {
"field" : "cityname.raw",
"min_doc_count" : 1
},
"aggregations" : {
"country_name" : {
"terms" : {
"field" : "countryname.raw"
}
}
}
}
}
}
现在文档New York
出现两次,带有额外的尾随空格。我得到的聚合结果如下:
{
"key": "New York",
"doc_count": 1,
"city_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "United States of America",
"doc_count": 1
}
]
}
},
{
"key": "New York ",
"doc_count": 1,
"city_name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "United States of America",
"doc_count": 1
}
]
}
}
我需要同时对待New York
和{{1}}。有什么方法可以查询我在同一组中得到它们。我猜测任何可以修剪尾随空格的东西。虽然找不到任何东西。感谢
答案 0 :(得分:2)
理想情况是在索引文档之前清理字段。如果这不是一个选项,您仍然可以在事后使用(例如)update-by-query plugin ...
清理它们或者,但是性能更差,使用terms
聚合与script
而不是field
,就像这样:
...
"aggregations" : {
"city_name" : {
"terms" : {
"script" : "doc['cityname.raw'].value.trim()",
"min_doc_count" : 1
},
"aggregations" : {
"country_name" : {
"terms" : {
"script" : "doc['countryname.raw'].value.trim()",
}
}
}
}
}
}
另一个解决方案是从not_analyzed
更改为analyzed
字符串,但创建一个自定义分析器,使用keyword
分析器保留令牌(not_analyzed
})使用trim
token filter。
{
"settings": {
"analysis": {
"analyzer": {
"trimmer": {
"type": "custom",
"filter": [ "trim" ],
"tokenizer": "keyword"
}
}
}
},
"mappings": {
"test": {
"properties": {
"cityname": {
"type": "string",
"analyzer": "trimmer"
},
"countryname": {
"type": "string",
"analyzer": "trimmer"
}
}
}
}
}
如果您索引cityname: "New York City "
,那么将要存储的令牌将被裁减为"New York City"