我正在努力对ElasticSearch进行一些汇总。 这是我的映射示例:
PUT example
{
"mappings": {
"properties": {
"domain": {
"type": "keyword"
},
"keyword": {
"type": "keyword"
},
"rank": {
"type": "long"
},
"ts": {
"type": "date"
}
}
}
}
每个文档代表关键字在特定日期在网站域中的位置。 因此,例如,如果我有2个网站apple.com和microsoft.com,关键字“ computer”有2个不同的位置,则每个域我将有2个文档,关键字为“ computer”。 如果我在没有汇总的情况下检索文档,则将具有:
"_source" : {
"keyword" : "computer",
"domain" : "apple.com",
"ts" : "2019-04-30",
"rank" : 12
}
"_source" : {
"keyword" : "computer",
"domain" : "microsoft.com",
"ts" : "2019-04-30",
"rank" : 9
}
我想生成一个汇总表,其中每一行代表一个关键字,两个域的排名。
+----------+---------------+-----------+
| | microsoft.com | apple.com |
+----------+---------------+-----------+
| computer | 9 | 12 |
+----------+---------------+-----------+
我需要对聚合进行分页和排序(按关键字,微软的排名或苹果的排名..)
您是否知道我该如何实现? 我做了汇总,但是不知道如何排序。
{
"size": 0,
"aggs": {
"terms_keyword": {
"composite": {
"sources": [
{
"keyword": {
"terms": {
"field": "keyword"
}
}
}
]
},
"aggs": {
"domain_terms": {
"terms": {
"field": "domain"
},
"aggs": {
"doc": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}
结果:
"aggregations" : {
"terms_keyword" : {
"after_key" : {
"keyword" : "computer"
},
"buckets" : [
{
"key" : {
"keyword" : "computer"
},
"doc_count" : 2,
"domain_terms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "apple.com",
"doc_count" : 1,
"doc" : {
"hits" : {
"total" : {
"value" : 12,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "example",
"_type" : "_doc",
"_id" : "1g1jWmoB9OyOTfm4a8kE",
"_score" : 1.0,
"_source" : {
"keyword" : "computer",
"domain" : "apple.com",
"ts" : "2019-04-30",
"rank" : 12
}
}
]
}
}
},
{
"key" : "microsoft.com",
"doc_count" : 1,
"doc" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "example",
"_type" : "_doc",
"_id" : "2Q1jWmoB9OyOTfm4tMl3",
"_score" : 1.0,
"_source" : {
"keyword" : "computer",
"domain" : "microsoft.com",
"ts" : "2019-04-30",
"rank" : 9
}
}
]
}
}
}
]
}
}
]
}
}
我可能没有使用正确的方法,我愿意接受任何建议。
谢谢
编辑:
好的,感谢这篇帖子How to group results in elasticsearch?
我使用了Field collapse feature in top hits aggregation,现在我可以按关键字重新分组并按2个具有子聚合的域名排行进行排序:
{
"size": 0,
"aggs": {
"top": {
"terms": {
"field": "keyword",
"order": {
"_key": "desc"
}
},
"aggs": {
"top_tags_hits": {
"top_hits": {}
},
"min_rank_apple": {
"min": {
"script": {
"source": "if (doc.domain.value == \"apple.com\") { doc.rank.value } else { 101 }"
}
}
},
"min_rank_microsoft": {
"min": {
"script": {
"source": "if (doc.domain.value == \"microsoft.com\") { doc.rank.value } else { 101 }"
}
}
}
}
}
}
}
我可以通过关键字, min_rank_apple 或 min_rank_microsoft 更改术语汇总中的顺序。但是由于术语aggs,我仍然无法分页。有composite aggregation,但我认为我们无法使用子聚合来订购。