Question

如何在mongoDB中对每个组进行排序和限制。

考虑以下数据：

Country:USA,name:xyz,rating:10,id:x
Country:USA,name:xyz,rating:10,id:y
Country:USA,name:xyz,rating:10,id:z
Country:USA,name:abc,rating:5,id:x
Country:India,name:xyz,rating:5,id:x
Country:India,name:xyz,rating:5,id:y
Country:India,name:abc,rating:10,id:z
Country:India,name:abc,rating:10,id:x

现在说我将按国家/地区进行分组并按评级排序，并将每个组的数据限制为2。

所以回答是：

Country:USA
name:xyz,rating:10,id:x
name:xyz,rating:10,id:y
Country:India
name:abc,rating:10,id:x
name:abc,rating:10,id:z

我想仅使用聚合框架来实现这一目标。

我尝试将汇总排序用于评级，但只是查询在处理后没有结果。

Answer 1

这里你最好的选择是为每个＆＃34;国家＆＃34;运行单独的查询。（理想情况下并行）并返回合并结果。查询非常简单，只需在对评级值进行排序后返回前2个值，即使您需要执行多个查询以获得完整结果，也会很快执行。

聚合框架现在甚至在不久的将来都不适合这种情况。问题是没有这样的操作员＆＃34;限制＆＃34;以任何方式进行任何分组的结果。所以为了做到这一点，你基本上需要将$push所有内容都放到一个数组中并提取＆＃34; top n＆＃34;那个价值。

当前需要做的操作非常糟糕，核心问题是结果可能超过大多数实际数据源上每个文档16MB的BSON限制。

由于您现在必须如何执行此操作，因此还存在n复杂性。但只是为了证明有两个项目：

db.collection.aggregate([
    // Sort content by country and rating
    { "$sort": { "Country": 1, "rating": -1 } },

    // Group by country and push all items, keeping first result
    { "$group": {
        "_id": "$Country",
        "results": {
            "$push": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        },
        "first": { 
            "$first": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        }
    }},

    // Unwind the array
    { "$unwind": "results" },

    // Remove the seen result from the array
    { "$redact": {
        "$cond": {
            "if": { "$eq": [ "$results.id", "$first.id" ] },
            "then": "$$PRUNE",
            "else": "$$KEEP"
        }
    }},

    // Group to return the second result which is now first on stack
    { "$group": {
        "_id": "$_id",
        "first": { "$first": "$first" },
        "second": { 
            "$first": {
                "name": "$results.name", 
                "rating": "$results.rating",
                "id": "$results.id"
            }
        }
    }},

    // Optionally put these in an array format
    { "$project": {
        "results": { 
            "$map": {
                "input": ["A","B"],
                "as": "el",
                "in": {
                    "$cond": {
                        "if": { "$eq": [ "$$el", "A" ] },
                        "then": "$first",
                        "else": "$second"
                    }
                }
            }
        }
    }}
])

这得到了结果，但它不是一个很好的方法，并且在更高限制的迭代中变得更复杂，甚至在某些情况下分组可能返回的结果可能少于n。

当前的开发系列（3.1.x）有一个$slice运算符，这使得它更简单，但仍然具有相同的＆＃34;大小＆＃34;缺陷：

db.collection.aggregate([
    // Sort content by country and rating
    { "$sort": { "Country": 1, "rating": -1 } },

    // Group by country and push all items, keeping first result
    { "$group": {
        "_id": "$Country",
        "results": {
            "$push": {
                "name": "$name", 
                "rating": "$rating",
                "id": "$id"
            }
        }
    }},
    { "$project": {
        "results": { "$slice": [ "$results", 2 ] }
    }}
])

但基本上直到聚合框架有某种方式来限制＆＃34;由$push或类似的分组＆＃34;限制＆＃34;生成的项目数运算符，那么聚合框架实际上不是解决此类问题的最佳解决方案。

像这样的简单查询：

db.collection.find({ "Country": "USA" }).sort({ "rating": -1 }).limit(1)

针对每个不同的国家/地区运行，理想情况下，通过线程的事件循环并行处理，并且合并结果立即生成最佳方法。它们只获取所需的内容，这是聚合框架在这种分组中无法处理的大问题。

所以寻求支持这样做＆＃34;结合查询结果＆＃34;以最佳方式替代您选择的语言，因为它比在聚合框架中抛出它更复杂，性能更高。

使用聚合在mongoDB中限制和排序每个组

1 个答案: