Question

这是我的ES索引中的文档示例：

{ 
    "concepts": [ 
        { 
            "type": "location",
            "entities": [ 
                { "text": "Raleigh" }, 
                { "text": "Damascus" }, 
                { "text": "Brussels" } 
            ] 
        }, 
        { 
            "type": "person", 
            "entities": [ 
                { "text": "Johnny Cash" }, 
                { "text": "Barack Obama" }, 
                { "text": "Vladimir Putin" }, 
                { "text": "John Hancock" } 
            ] 
        }, 
        { 
            "type": "organization", 
            "entities": [ 
                { "text": "WTO" }, 
                { "text": "IMF" }, 
                { "text": "United States of America" } 
            ] 
        } 
    ] 
}

我试图在我的文档集中聚合并计算每个概念实体的频率，以获取特定的概念类型。让我们说我只对聚合类型＆＃34; location＆＃34;的概念实体感兴趣。我的聚合桶随后将成为＆＃34; concepts.entities.text＆＃34;，但我只想聚合它们，如果＆＃34; concepts.type＆＃34;等于＆＃34; location＆＃34;。这是我的尝试：

{
    "query": {
        // Whatever query
    },
    "aggs": {
        "location_concept_type": {
            "filter": {
                "term": { "concepts.type": "location" }
            },
            "aggs": {
                "entities": {
                    "terms": { "field": "concepts.hits.text" }
                }
            }
        }
    }
}

这样做的问题在于，它将过滤掉没有任何类型＆＃34; location＆＃34;的概念实体的文档。但对于那些确实具有类型＆＃34; location＆＃34; 和其他东西，无论概念类型如何，它都会将所有概念实体存储起来。

我也试过通过以下方式重组我的文档：

{ 
    "concepts": [ 
        { 
            "type": "location",
            "text": "Raleigh"
        },
        { 
            "type": "location",
            "text": "Damascus"
        },
        { 
            "type": "location",
            "text": "Brussels"
        }, 
        { 
            "type": "person",
            "text": "Johnny Cash"
        },
        { 
            "type": "person",
            "text": "Barack Obama"
        }
        { 
            "type": "person",
            "text": "Vladimir Putin"
        }
        { 
            "type": "person",
            "text": "John Hancock"
        }, 
        { 
            "type": "organization",
            "text": "WTO" 
        },
        { 
            "type": "organization",
            "text": "IMF" 
        },
        { 
            "type": "organization",
            "text": "United States of America" 
        }
    ] 
}

但这也不起作用。最后，我不能使用概念类型作为键（我相信这将解决我的问题），因为我还需要能够聚合所有概念类型（并且可能存在无限且不断变化的概念类型）。 / p>

知道怎么办？在此先感谢您的帮助。

Answer 1

如果您按如下方式构建索引：

{ 
    "concepts": [ 
        { 
            "type": "location",
            "text": "Raleigh"
        },
        { 
            "type": "location",
            "text": "Damascus"
        }
    ]
}

并定义＆＃34;概念＆＃34;映射中的字段为嵌套对象，您可以应用以下搜索，在嵌套聚合中嵌套过滤器聚合：

{ "query": { "match_all": {} }, "aggs": { "location_entities": { "nested": { "path": "concepts" } }, "aggs": { "filtered_aggregation": { "filter": { "term": { "concepts.type": "location" } }, "aggs": { "my_aggregation": { "terms": { "field": "concepts.text" } } } } } } }

在回复中，您知道您只获得了位置实体。这种方法比＆＃34; hack＆＃34;更快。在另一个答案中。

启动版本1.0.4Beta1，Elasticsearch提供filters aggregation。使用过滤器聚合替换嵌套聚合中的过滤器聚合，您可以按实体类型对您的聚合进行bucketize。

Answer 2

我找到了一种破解方法。我会把它作为答案，但请随意添加另一个更优雅的答案。我所做的是在“type”和“text”旁边添加一个属性，我们称之为“text_exp”，它将类型和文本组合如下：

{
    "concepts": [
        { "type": "location", "text": "Raleigh", "text_exp": "location~Raleigh" },
        //...
    ]
}

然后我在聚合术语中使用正则表达式，如下所示。假设我只想聚合“location”类型的实体：

{
    "query": {
        // Whatever query
    },
    "aggs": {
        "location_entities": {
            "terms": { 
                "field": "concepts.text_exp",
                "include": "location~.*"
            }
        }
    }
}

然后在回复中我只是分开“〜”并采取正确的部分。

Elasticsearch中多字段的条件聚合

2 个答案: