我有以下搜索/城市索引,其中元素将具有名称和一堆其他属性。我执行以下聚合搜索:
{
"size": 0,
"query": {
"multi_match" : {
"query": "ana",
"fields": [ "cityName" ],
"type" : "phrase_prefix"
}
},
"aggs": {
"res": {
"terms": {
"field": "cityName"
},
"aggs":{
"dedup_docs":{
"top_hits":{
"size":1
}
}
}
}
}
}
结果我获得了3个带有“Anahiem”,“ana”和“santa”键的桶。以下是结果:
"buckets": [
{
"key": "anaheim",
"doc_count": 11,
"dedup_docs": {
"hits": {
"total": 11,
"max_score": 5.8941016,
"hits": [
{
"_index": "search",
"_type": "City",
"_id": "310",
"_score": 5.8941016,
"_source": {
"id": 310,
"country": "USA",
"stateCode": "CA",
"stateName": "California",
"cityName": "Anaheim",
"postalCode": "92806",
"latitude": 33.822738,
"longitude": -117.881633
}
}
]
}
}
},
{
"key": "ana",
"doc_count": 4,
"dedup_docs": {
"hits": {
"total": 4,
"max_score": 2.933612,
"hits": [
{
"_index": "search",
"_type": "City",
"_id": "154",
"_score": 2.933612,
"_source": {
"id": 154,
"country": "USA",
"stateCode": "CA",
"stateName": "California",
"cityName": "Santa Ana",
"postalCode": "92706",
"latitude": 33.767371,
"longitude": -117.868255
}
}
]
}
}
},
{
"key": "santa",
"doc_count": 4,
"dedup_docs": {
"hits": {
"total": 4,
"max_score": 2.933612,
"hits": [
{
"_index": "search",
"_type": "City",
"_id": "154",
"_score": 2.933612,
"_source": {
"id": 154,
"country": "USA",
"stateCode": "CA",
"stateName": "California",
"cityName": "Santa Ana",
"postalCode": "92706",
"latitude": 33.767371,
"longitude": -117.868255
}
}
]
}
}
}
]
问题是为什么最后一桶有钥匙“圣诞老人”,即使我搜索“ana”,为什么同一个城市“Santa Ana”(id = 154)出现在2个不同的桶中(关键“ana”和关键“圣诞老人” “)?
答案 0 :(得分:1)
<强>更新强>
重复是top_hits聚合的行为。
检查一下好的教程:
https://www.elastic.co/blog/top-hits-aggregation
当单独使用top_hits聚合时,它只是重复是什么 已经在回复的常规点击中。
实际上分析与它无关。所以下面的阐述是不正确的。
在默认设置中,Elasticsearch会将输入拆分为所谓的术语。默认分析器会将Santa Ana
转换为2个术语,如[santa
,ana
]。搜索ana
Santa Ana
时结束也将匹配。
您可以从这里了解Elastichsearch的工作原理:
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
答案 1 :(得分:1)
这主要是因为您的cityName
字段已被分析,因此,当Santa Ana
被编入索引时,会生成两个令牌santa
和ana
用于划分。
如果您想阻止您需要像这样定义cityName
字段:
PUT search
{
"mappings": {
"City": {
"properties": {
"cityName": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
首先需要擦除索引,使用上面的映射重新创建它,然后重新索引数据。只有这样,您才能将您的广告位名称设为Anaheim
和Santa Ana
。
<强>更新强>
如果您希望对cityName
进行分析,但只在聚合中获得一个存储桶,则可以通过定义multi-field来实现,其中一部分进行分析而另一部分不进行分析,像这样
PUT search
{
"mappings": {
"City": {
"properties": {
"cityName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
因此,您需要对cityName
进行分析,但现在您还有cityName.raw
未经分析,您可以在聚合中使用,如下所示:
"terms": {
"field": "cityName.raw"
},