我在Elasticsearch中有索引。其中的文档具有重复的字段值。在查询结果中,我需要删除所有重复项,并仅获取不同的值。例如:
PUT本地主机:9200 /人
{
"mappings" : {
"person" : {
"properties" : {
"name" : { "type" : "keyword" }
}
}
}
}
POST本地主机:9200 /人/人
{
"name": "John"
}
{
"name": "John"
}
{
"name": "Marry"
}
{
"name": "Tomas"
}
我正在尝试通过“名称”字段删除术语汇总中的重复项,但这不起作用。
获取localhost:9200 /人/人/ _搜索
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "dasdfdLBpnM0"
}
}
]
}
},
"aggs": {
"top-names": {
"terms": {
"field": "name",
"size": 3
},
"aggs": {
"top_names_hits": {
"top_hits": {
"size": 1
}
}
}
}
}
}
结果:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.9506482,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "H-5D8GoB8pRyckNSVUeN",
"_score": 0.9506482,
"_source": {
"name": "Tomas"
}
},
{
"_index": "person",
"_type": "person",
"_id": "He5D8GoB8pRyckNSPEfa",
"_score": 0.7700638,
"_source": {
"name": "John"
}
},
{
"_index": "person",
"_type": "person",
"_id": "HO5D8GoB8pRyckNSN0fo",
"_score": 0.71723765,
"_source": {
"name": "John"
}
}
]
},
"aggregations": {
"top-names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "John",
"doc_count": 2,
"top_names_hits": {
"hits": {
"total": 2,
"max_score": 0.7700638,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "He5D8GoB8pRyckNSPEfa",
"_score": 0.7700638,
"_source": {
"name": "John"
}
}
]
}
}
},
{
"key": "Marry",
"doc_count": 1,
"top_names_hits": {
"hits": {
"total": 1,
"max_score": 0.66815424,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "Iu5D8GoB8pRyckNScUdv",
"_score": 0.66815424,
"_source": {
"name": "Marry"
}
}
]
}
}
},
{
"key": "Tomas",
"doc_count": 1,
"top_names_hits": {
"hits": {
"total": 1,
"max_score": 0.9506482,
"hits": [
{
"_index": "person",
"_type": "person",
"_id": "H-5D8GoB8pRyckNSVUeN",
"_score": 0.9506482,
"_source": {
"name": "Tomas"
}
}
]
}
}
}
]
}
}
}
聚合应用于名称为“ Marry”的文档,但是我不明白为什么,以及如何将聚合仅应用于查询结果。
答案 0 :(得分:0)
以下是Elasticsearch查询蓝图。...
{
"size": n, // Return the n documents based on "query" section (to frontend)
"query": {
// Here is where you are supposed to mention what documents you want
// Any filter/bool/match query condition
// In your case, you haven't specified any correct condition.
// So basically, it would return all the documents or documents based on size parameter. In your case it returns 3.
},
"aggs":{
// This aggregation query would only be applied on documents
// based on documents filtered/matched by the "query" section.
// In your case it is applying aggregation on all documents of that index as per the comment I've mentioned in the above query section.
}
}
要获得所需的内容,只需使用下面的简化查询,该查询是将Terms Aggregation与Top Hits作为子聚合使用的。
POST person/_search
{
"size": 0, <------- This is to say, I don't want "query" results to be returned and that I only want below aggregation results.
"aggs": {
"top-names": {
"terms": {
"field": "name",
"size": 10
},
"aggs": {
"top_hits_documents": { <------- Top hits would return the actual documents
"top_hits": {
"size": 1
}
}
}
}
}
}
通过指定"size": 0
,在最顶部基本上是对所有文档应用汇总,并且不返回任何 query 结果。
您只需返回汇总结果。
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [ ] <------ Notice this. No query results returned
},
"aggregations" : { <------ Aggregation Result starts
"top-names" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "John", <------- This is to say there's a value called John
"doc_count" : 2, <------- John occurs in two documents.
"top_hits_documents" : {
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"name" : "John"
}
}
]
}
}
},
{
"key" : "Marry",
"doc_count" : 1,
"top_hits_documents" : {
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"name" : "Marry"
}
}
]
}
}
},
{
"key" : "Thomas",
"doc_count" : 1,
"top_hits_documents" : {
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "person",
"_type" : "person",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "Thomas"
}
}
]
}
}
}
]
}
}
}
希望有帮助!