我试图编写一个弹性搜索查询,将所有博客分组到相同的博客域(wordpress.com,blog.com等)。这就是我的查询的样子:
{
"engagements": [
"blogs"
],
"query": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"weight": {
"gte": 120,
"lte": 150
}
}
}
]
}
}
}
},
"facets": {
"my_facet": {
"terms": {
"field": "blog_domain" <-------------------------------------
}
}
}
},
"api": "_search"
}
然而,它正在归还:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
...
]
},
"facets": {
"my_facet": {
"_type": "terms",
"missing": 0,
"total": 21,
"other": 3,
"terms": [
{
"term": "http",
"count": 3
},
{
"term": "noblepig.com",
"count": 2
},
{
"term": "hawaiian",
"count": 2
},
{
"term": "dream",
"count": 2
},
{
"term": "dessert",
"count": 2
},
{
"term": "2015",
"count": 2
},
{
"term": "05",
"count": 2
},
{
"term": "www.bt",
"count": 1
},
{
"term": "photos",
"count": 1
},
{
"term": "images.net",
"count": 1
}
]
}
}
}
这不是我想要的。 现在我的数据库有三条记录:
"http://www.bt-images.net/8-cute-photos-cats/",
"http://noblepig.com/2015/05/hawaiian-dream-dessert/",
"http://noblepig.com/2015/05/hawaiian-dream-dessert/"
我希望它返回类似的内容:
"facets": {
"my_facet": {
"_type": "terms",
"missing": 0,
"total": 21,
"other": 3,
"terms": [
{
"term": "http://noblepig.com/2015/05/hawaiian-dream-dessert/",
"count": 2
},
{
"term": "http://www.bt-images.net/8-cute-photos-cats/",
"count": 1
},
我该怎么做?我查了一下,看到人们推荐mappings
,但我不知道在这个查询中把它放在哪里,我的表有1亿条记录,所以为时已晚。如果您有建议,可以粘贴整个查询吗?
使用aggs
:
{
"engagements": [
"blogs"
],
"query": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"range": {
"weight": {
"gte": 13,
"lte": 75
}
}
}
]
}
}
}
},
"aggs": {
"blah": {
"terms": {
"field": "blog_domain"
}
}
}
},
"api": "_search"
}
答案 0 :(得分:3)
执行此操作的正确方法是为该字段设置不同的映射。您可以通过向blog_domain
添加子字段来更改路线上的映射,但无法更改已编入索引的文档。映射更改将对新文档生效。
为了提及这一点,您的blog_domain
应如下所示:
"blog_domain": {
"type": "string",
"fields": {
"notAnalyzed": {
"type": "string",
"index": "not_analyzed"
}
}
}
意味着它应该有一个子字段(在我的示例中称为notAnalyzed
),在您的聚合中,您应该使用blog_domain.notAnalyzed
。
但是,如果你不想或不能做出这种改变,有一种方法,但我认为它更慢:使用脚本聚合。像这样:
{
"aggs": {
"blah": {
"terms": {
"script": "_source.blog_domain",
"size": 10
}
}
}
}
如果你没有启用它,则需要enable dynamic scripting。