Question

我试图在MongoDB中学习MapReduce函数。我想使用MapReduce函数按照自己定义的键对文档进行分组，而不是使用聚合。

我的收藏酷是：

/ * 1 * / {       ＆＃34; _id＆＃34; ：ObjectId（＆＃34; 55d5e7287e41390ea7e83a55＆＃34;），       ＆＃34; ID＆＃34; ：＆＃34; a＆＃34;，       ＆＃34;凉爽＆＃34; ：＆＃34; a1＆＃34; }

/ * 2 * / {       ＆＃34; _id＆＃34; ：ObjectId（＆＃34; 55d5e7287e41390ea7e83a56＆＃34;），       ＆＃34; ID＆＃34; ：＆＃34; a＆＃34;，       ＆＃34;凉爽＆＃34; ：＆＃34; a2＆＃34; }

/ * 3 * / {       ＆＃34; _id＆＃34; ：ObjectId（＆＃34; 55d5e7287e41390ea7e83a57＆＃34;），       ＆＃34; ID＆＃34; ：＆＃34; b＆＃34;，       ＆＃34;凉爽＆＃34; ：＆＃34; b1＆＃34; }

/ * 4 * / {       ＆＃34; _id＆＃34; ：ObjectId（＆＃34; 55d5e7287e41390ea7e83a58＆＃34;），       ＆＃34; ID＆＃34; ：＆＃34; b＆＃34;，       ＆＃34;凉爽＆＃34; ：＆＃34; b2＆＃34; }

/ * 5 * / {       ＆＃34; _id＆＃34; ：ObjectId（＆＃34; 55d5e7287e41390ea7e83a59＆＃34;），       ＆＃34; ID＆＃34; ：＆＃34; c＆＃34;，       ＆＃34;凉爽＆＃34; ：＆＃34; c1＆＃34; }

/ * 6 * / {       ＆＃34; _id＆＃34; ：ObjectId（＆＃34; 55d5e7287e41390ea7e83a5a＆＃34;），       ＆＃34; ID＆＃34; ：＆＃34; d＆＃34;，       ＆＃34;凉爽＆＃34; ：＆＃34; d1＆＃34; }

这是我的MapReduce函数：

db.Cool.mapReduce(
    function(){emit(this.id, this.cool)},
    function(key, values){
        var res = [];
        values.forEach(function(v){
            res.push(v);
            });
        return {cools: res};
        },
    {out: "MapReduce"}     
)

我希望得到这样的结果：

/ * 1 * / { ＆＃34; _id＆＃34; ：＆＃34; a＆＃34;，＆＃34;值＆＃34; ：{ ＆＃34;冷却＆＃34; ：[ ＆＃34; A1＆＃34 ;, ＆＃34; A2＆＃34; ] }}

但是在返回的集合中，有：

/ * 1 * / {       ＆＃34; _id＆＃34; ：＆＃34; a＆＃34;，       ＆＃34;值＆＃34; ：{           ＆＃34;冷却＆＃34; ：[               ＆＃34; A1＆＃34 ;,               ＆＃34; A2＆＃34;           ]       }}

/ * 2 * / {       ＆＃34; _id＆＃34; ：＆＃34; b＆＃34;，       ＆＃34;值＆＃34; ：{           ＆＃34;冷却＆＃34; ：[               ＆＃34; B1＆＃34 ;,               ＆＃34; B2＆＃34;           ]       }}

/ * 3 * / {       ＆＃34; _id＆＃34; ：＆＃34; c＆＃34;，       ＆＃34;值＆＃34; ：＆＃34; c1＆＃34; }

/ * 4 * / {       ＆＃34; _id＆＃34; ：＆＃34; d＆＃34;，       ＆＃34;值＆＃34; ：＆＃34; d1＆＃34; }

问题是：为什么文件与＃34; id＆＃34;：＆＃34; a＆＃34; （＆＃34; id＆＃34;：＆＃34; a＆＃34;）和＆＃34; id＆＃34;的文档有多个文件：＆＃34; c＆＃34; （只有一个文件＆＃34; id＆＃34;：＆＃34; c＆＃34;）

感谢您的任何建议，并抱歉我的英语不好。

Answer 1

map函数和reduce函数中的返回值必须相同。否则，您的集合中的单个值将按照您在地图函数中指定的值返回。这是由于优化而发生的，因为对于在映射阶段返回单个值的键，将不执行reduce函数。以下是如何做到这一点：

db.Cool.mapReduce(
    function () {
        emit(this.id, {cools: [this.cool]}) // same data structure as  in your reduce function
    },
    function (key, values) {
        var res = {cools: []}; // same data structure as the value of map phase
        values.forEach(function (v) {
            res.cools = res.cools.concat(v.cools);
        });
        return res;
    },
    {out: "MapReduce"}
)

Answer 2

在您的学习中，您可能错过了mapReduce上的核心手册页。您错过或未阅读和学习的信息有vital piece个：

MongoDB可以为同一个密钥多次调用reduce函数。在这种情况下，该键的reduce函数的先前输出将成为该键的下一个reduce函数调用的输入值之一。

之后有点：

返回对象的类型必须与map函数发出的值的类型相同。

所以这基本上意味着因为“reducer”实际上不会同时处理所有唯一键的“全部”，所以它需要相同的“输入”，因为它给出了“输出”，因为输出可以是再次反馈到减速机。

出于同样的原因，“映射器”需要准确输出预期的“reducer”输出，它也是reducer“input”。所以你根本没有“改变”数据结构，而只是“减少”它。

db.Cool.mapReduce(
    function(){emit(this.id, { "cools": [this.cool] })},
    function(key, values){
        var res = [];
        values.forEach(function(cool){
            cool.cools.forEach(function(v) {
                res.push(v);
            });
        });
        return {cools: res};
    },
    {out: "MapReduce"}     
)

现在您将输入作为数组处理，也是输出，然后返回预期结果。

接下来要学习的是，在大多数情况下，mapReduce并不是您想要使用的，而是您应该使用aggregation framework。

与mapReduce相反，它使用“本机编码”运算符，不需要运行JavaScript解释。这在很大程度上意味着它“更快”，并且通常在构造上更加简单。

以下是与.aggregate()相同的操作：

db.Cool.aggregate([
    { "$group": {
        "_id": "$id",
        "cools": { "$push": "$cool" }
    }}
])

同样的事情，更少的编码和更快的速度。

输出到您使用$out的另一个集合：

db.Cool.aggregate([
    { "$group": {
        "_id": "$id",
        "cools": { "$push": "$cool" }
    }},
    { "$out": "reduced" }
])

对于记录，这是mapReduce输出：

{ "_id" : "a", "value" : { "cools" : [ "a1", "a2" ] } }
{ "_id" : "b", "value" : { "cools" : [ "b1", "b2" ] } }
{ "_id" : "c", "value" : { "cools" : [ "c1" ] } }
{ "_id" : "d", "value" : { "cools" : [ "d1" ] } }

总产量。与mapReduce _id和value的唯一区别在于，密钥是反转的，因为$group不保证订单（但通常被视为反向堆栈）：< / p>

{ "_id" : "d", "cools" : [ "d1" ] }
{ "_id" : "c", "cools" : [ "c1" ] }
{ "_id" : "b", "cools" : [ "b1", "b2" ] }
{ "_id" : "a", "cools" : [ "a1", "a2" ] }

MongoDB中的MapReduce函数 - 按ID

2 个答案: