Question

我有一个弹性搜索文档结构，我希望有一个术语方面（或加法），我可以获得与它们出现的字段无关的文档数。

例如，le结果显示文档和分面搜索结果：

    {
        "_shards": {
            "failed": 0, "successful": 5, "total": 5
        },
        "hits": {
            "hits": [
                {
                    "_id": "003", "_index": "test", "_score": 1.0, "_type": "test",
                    "_source": {
                        "root": {
                            "content": [
                                "five",
                                "five",
                                "five"
                            ],
                            "title": "four"
                        }
                    }
                },
                {
                    "_id": "002", "_index": "test", "_score": 1.0, "_type": "test",
                    "_source": {
                        "root": {
                            "content": "two three",
                            "title": "three"
                        }
                    }
                },
                {
                    "_id": "001", "_index": "test", "_score": 1.0, "_type": "test",
                    "_source": {
                        "root": {
                            "content": "one two",
                            "title": "one"
                        }
                    }
                }
            ],
            "max_score": 1.0, "total": 3
        },
        "facets": {
            "terms": {
                "_type": "terms", "missing": 0, "other": 0,
                "terms": [
                    {
                        "count": 2,
                        "term": "two"
                    },
                    {
                        "count": 2,
                        "term": "three"
                    },
                    {
                        "count": 2,
                        "term": "one"
                    },
                    {
                        "count": 1,
                        "term": "four"
                    },
                    {
                        "count": 1,
                        "term": "five"
                    }
                ],
                "total": 8
            }
        },
        "timed_out": false,
        "took": 18,
    }

我们可以看到术语“一个”和“三个”的计数为2（对于同一个文档的每个字段一次），我希望它们的计数为1.唯一一个计数为2的术语应该是“两个”。

我调查聚合以查看它是否有用，但它似乎不适用于多个字段（或者我错过了某些内容）。

在“根”而不是单个字段上构建“术语”方面会很不错......但这似乎也不可能。

任何想法，如何解决这个问题？

Answer 1

您可以使用术语聚合中的脚本来实现此目的。在脚本内部，从两个字段中收集标记，执行set union操作然后返回集合。

{
    "aggs" : {
        "genders" : {
            "terms" : {
                "script" : "union(doc['content'].values, doc['title'].values) "
            }
        }
    }
}

您需要了解如何以您用作脚本语言的语言应用联合操作。

Answer 2

您可以添加新字段，该字段可以保留内容和标题字段中的唯一字词，并在其上进行构面聚合。

Elasticsearch：facet或aggregation返回doc计数多个字段

2 个答案: