Question

我喜欢为每个用户提供最好的n个文档，这些文档在我的索引中存储为user_id。直到现在这都不是问题。可以这样做：

{
    "query":{
        "match":{
            "field":{
                "query":"query_string"
            }
        }
    },
    "aggs":{
        "group_by_user":{
            "terms":{
                "field":"user_id"
            },
            "aggs":{
                "top_n":{
                    "top_hits":{
                        "size":10
                    }
                }
            }
        }
    }
}

但是现在我喜欢对它进行子聚合来计算一些昂贵的得分，这已经不可能了，因为top_hits是一个度量聚合。

"aggs":{
    "max_score_per_user":{
        "max":{
            "script":"advanced_scoring"
            }
        }
    }
}

我也无法使用window参数来使用rescore功能，因为我首先必须为每个用户提供文档，然后为每个用户提供最佳n。

范围查询可行，但tf-idf评分的结果不具有可比性。所以我无法定义合适的范围。

这是不可能的，或者我做错了什么？

Answer 1

您可以将max_score_per_user group_by_user和top_n聚合的子聚合作为max_score_per_user的子聚合：

{
    "query": {
        "match": {
            "field": {
                "query": "query_string"
            }
        }
    },
    "aggs": {
        "group_by_user": {
            "terms": {
                "field": "user_id",
                "order": {
                    "max_score_per_user": "desc"
                }
            },
            "aggs": {
                "max_score_per_user": {
                    "max": {
                        "script": "advanced_scoring"
                    },
                    "aggs": {
                        "top_n": {
                            "top_hits": {
                                "size": 10
                            }
                        }
                    }
                }
            }
        }
    }
}

如何获取多个桶的最佳n个文档并对其进行子聚合

1 个答案: