我有带有类别字段的产品。使用聚合,我可以获得所有子类别的完整类别。我想限制构面中的水平。
例如我有以下方面:
auto, tools & travel (115)
auto, tools & travel > luggage tags (90)
auto, tools & travel > luggage tags > luggage spotters (40)
auto, tools & travel > luggage tags > something else (50)
auto, tools & travel > car organizers (25)
使用
之类的聚合"aggs": {
"cat_groups": {
"terms": {
"field": "categories.keyword",
"size": 10,
"include": "auto, tools & travel > .*"
}
}
}
我越来越喜欢
"buckets": [
{
"auto, tools & travel > luggage tags",
"doc_count": 90
},
{
"key": "auto, tools & travel > luggage tags > luggage spotters",
"doc_count": 40
},
{
"key": "auto, tools & travel > luggage tags > something else",
"doc_count": 50
},
{
"key": "auto, tools & travel > car organizers",
"doc_count": 25
}
]
但是我想限制水平。例如我只想获取auto, tools & travel > luggage tags
的结果。如何限制水平?
顺便说一下,"exclude": ".* > .* > .*"
对我不起作用。
我需要根据搜索获得不同级别的存储桶。有时是第一级,有时是第二或第三级。当我想要第一级时,我不希望第二级出现在存储桶中。等等。
Elasticsearch 6.4版
答案 0 :(得分:1)
最后,我已经能够理解以下技术。
我已经使用Path Hierarchy Tokenizer实现了custom analyzer
,并且创建了一个名为categories
的多字段,以便您可以使用categories.facets
进行聚合/构面并进行常规文本搜索使用categories
。
自定义分析器仅适用于categories.facets
请注意我的字段"fielddata": "true"
的属性categories.facet
PUT myindex
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "path_hierarchy",
"delimiter": ">"
}
}
}
},
"mappings": {
"mydocs": {
"properties": {
"categories": {
"type": "text",
"fields": {
"facet": {
"type": "text",
"analyzer": "my_analyzer",
"fielddata": "true"
}
}
}
}
}
}
}
POST myindex/mydocs/1
{
"categories" : "auto, tools & travel > luggage tags > luggage spotters"
}
POST myindex/mydocs/2
{
"categories" : "auto, tools & travel > luggage tags > luggage spotters"
}
POST myindex/mydocs/3
{
"categories" : "auto, tools & travel > luggage tags > luggage spotters"
}
POST myindex/mydocs/4
{
"categories" : "auto, tools & travel > luggage tags > something else"
}
您可以尝试下面的查询。同样,我已经实现了Filter Aggregation,因为您只需要将特定的单词与Terms Aggregation一起使用。
{
"size": 0,
"aggs":{
"facets": {
"filter": {
"bool": {
"must": [
{ "match": { "categories": "luggage"} }
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "categories.facet"
}
}
}
}
}
}
{
"took": 43,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 11,
"max_score": 0,
"hits": []
},
"aggregations": {
"facets": {
"doc_count": 4,
"categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "auto, tools & travel ",
"doc_count": 4
},
{
"key": "auto, tools & travel > luggage tags ",
"doc_count": 4
},
{
"key": "auto, tools & travel > luggage tags > luggage spotters",
"doc_count": 3
},
{
"key": "auto, tools & travel > luggage tags > something else",
"doc_count": 1
}
]
}
}
}
}
POST myindex/_search
{
"size": 0,
"aggs":{
"facets": {
"filter": {
"bool": {
"must": [
{ "match": { "categories": "luggage"} }
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "categories.facet",
"exclude": ".*>{1}.*>{1}.*"
}
}
}
}
}
}
请注意,我在exclude
中添加了regular expression
,这样它就不会考虑出现多个>
如果有帮助,请告诉我。
答案 1 :(得分:0)
只需添加一个名为level的整数字段即可表示层次结构中类别的级别。只需计算定界符“>”的出现次数并将其保存为值即可。然后将rangeQuery添加到您的boolQuery中。
将此添加到您的模式:
"level": {
"type": "integer",
"store": "true",
"index": "true"
}
在您的代码中,您会看到类似以下内容的内容,该内容计算出建议层次结构级别的定界符的数量(没有定界符表示主要类别):
public Builder(final String path) {
this.path = path;
this.level = StringUtils.countMatches(path, DELIMITER);
}
,然后您的查询搜索可能类似于:
{
"query": {
"bool": {
"filter": [
{
"prefix": {
"category": {
"value": "auto, tools & travel",
"boost": 1
}
}
},
{
"range": {
"level": {
"from": 2,
"to": 4,
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
}