如何获得我的弹性搜索文档的标签及其参考?

时间:2019-03-20 08:00:14

标签: elasticsearch elasticsearch-aggregation

我有一个像这样的文档结构:

"properties":   {
  "username":   {"type":"keyword, "normalizer":"lc"},
  "title":      {"type":"text", "analyzer":"title_lc", "search_analyzer":"whitespace"},
  "text":       {"type":"text",analyzer":"simple"},
  "tags":       {"type":"keyword","normalizer":"lc"},
  "references": {"type":nested", "properties": {
    "name": {"type":"text", "analyzer":"name_lc", "search_analyzer":"whitespace"},
    "text": {"type":"text", "analyzer":"simple"},
    "tags": {"type":"keyword", "normalizer":"lc"},
  }
}

然后我尝试了一个聚合,该聚合不带嵌套的聚合,而是带嵌套的聚合,例如:

{
    "_source": false,
    "aggs": {
        "main_tag": {
            "terms": {"field":"tags"}
        },
        "sub_tag": {
            "nested": {"path": "references"},
            "aggs":{
                "sub_tag_nest":{
                    "terms": {"field":"references.tags"}
                }
            }
        }
    }
}

现在这给了我一些标签……

{
    "took": 4,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": 1496,
        "max_score": 1,
        "hits": [
            {
                "_index": "index",
                "_type": "doc",
                "_id": "unique_id_1",
                "_score": 1
            },
…
            {
                "_index": "index",
                "_type": "doc",
                "_id": "unique_id_10",
                "_score": 1
            }
        ]
    },
    "aggregations": {
        "main_tag": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "",
                    "doc_count": 1
                },
                {
                    "key": "one",
                    "doc_count": 1
                },
                {
                    "key": "Test1",
                    "doc_count": 1
                }
            ]
        },
        "sub_tag": {
            "doc_count": 25357,
            "sub_tag_nest": {
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0,
                "buckets": [
                    {
                        "key": "any",
                        "doc_count": 1575
                    },
                    {
                        "key": "one",
                        "doc_count": 1
                    },
                    {
                        "key": "test",
                        "doc_count": 1
                    }
                ]
            }
        }
    }
}

是否可以通过点击或完全不点击来获取?

如果有100、1000、100,000个标签,它会缩放吗?

除了文件计数之外,我还能得到更多细节吗? 像最理想的情况一样,是主要标签的文档ID列表,子标签的文档ID和名称的列表(我知道…不太可能)

我以为我正在使用复合子聚合,但是似乎不支持该操作:

{"_source": false, "aggs": {
 "main_tag_c": {
   "composite": {"sources": [{
     "main_tag": {"terms": {"field": "tags"} }
   }]},
   "aggs": {"s_main_tag_c_id": {"terms": {"field": "_id"} } }
 },
 "sub_tag": {
   "nested": {"path": "references"},
   "aggs":{
     "sub_tag_nest_c": {"composite": {"sources": [{
        "sub_tag_nest":{"terms": {"field": "references.tags"} }
     }]},
     "aggs":{
     "s_sub_tag_nest_c": {"composite": {"sources": [{
        "sub_tag_nest_id":{"terms": {"field": "_id"} }
     }, {
        "sub_tag_nest_name":{"terms": {"field": "references.name"} }
     }]} }
   }
 }
} }

但是,这告诉我组合不能具有父聚合。这太糟糕了,因为我不确定我还能如何在每个标签下同时列出ID和名称对。

它适用于顶层标签,尽管您在每个标签下只能看到有限的文档ID列表。该文档还暗示,术语将始终受到限制,并且将它们包装在复合词中应将所有关键字作为存储桶值。

0 个答案:

没有答案