Question

我有大约30万份文件，如：

 var rxReg = new Regex(@"\[cite:(\w+)]");
 var result = rxReg.Replace(input, @"\cite{$1}");

以12个月为基础的收藏品。现在我需要将这些值总结为全年的收集。有很多例子说明如何在一个集合中进行聚合，但我不知道如何从许多集合开始。我应该从一些mapreduce发射函数开始吗？

Answer 1

我想你可能会使用db.getCollectionNames()命令来获取一个集合名称数组，然后可以在循环中迭代，使用db.getCollection(name)方法计算每个集合的聚合。

为了说明这一点，假设您的测试数据库中包含以下集合和相应的文档：

use test;
db.jan_stats.insert([
    {
         "_id" : {
            "municipality" : "Stockholm",
            "keyword" : "hotel"
        },
        "total" : 2
    },
    {
         "_id" : {
            "municipality" : "Malmö",
            "keyword" : "school"
        },
        "total" : 5
    }
]);
db.feb_stats.insert([
    {
         "_id" : {
            "municipality" : "Stockholm",
            "keyword" : "hotel"
        },
        "total" : 6
    },
    {
         "_id" : {
            "municipality" : "Malmö",
            "keyword" : "school"
        },
        "total" : 4
    }
]);

然后你可以在mongo shell中尝试上面的逻辑，如下所示：

connecting to: test
> var collections = db.getCollectionNames(),
...     annual_total = 0;
> collections.forEach(function(name){
...     var res = db.getCollection(name).aggregate([
...         {
...             "$group": {
...                 "_id": null,
...                 "total": { "$sum": "$total" }
...             }
...         }
...     ]).toArray();
...     annual_total += res[0].total;
... });
> print(annual_total);
17
>

如果收集的实际数量为12，即每个月，上述内容当然会为您提供正确的年度总数。

在性能方面，您需要进行适当的优化，以便上述聚合更快地运行。关于如何与Map-Reduce操作进行比较还不太确定，但我相信如果您只是在总字段上汇总，那么使用 appropriate indexing 和 reshaping the pipeline < / strong>为了提高性能，聚合方法会相对更快。

Mongodb聚合/映射减少了许多集合中的值

1 个答案: