多个MapReduce函数或聚合框架,用于Mongodb中的唯一值和计数?

时间:2013-08-20 02:28:14

标签: mongodb mapreduce mongodb-query aggregation-framework

我对MongoDB中的mapReduce和聚合有点新意。

以下是数据集的示例:

{ "_id" : ObjectId("521002161e0787522098d110"), "userId" : 4545454, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002481e0787522098d111"), "userId" : 64545454, "pickId" : 1, "answerArray" : [  "no" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("521002871e0787522098d112"), "userId" : 78263636, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "Albany", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 2, "answerArray" : [  "yes" ], "city" : "New York", "state" : "New York" }
{ "_id" : ObjectId("5211507c1e0787522098d113"), "userId" : 78263636, "pickId" : 1, "answerArray" : [  "yes" ], "city" : "Wichita", "state" : "Kansas" }

我希望获得state,city,pickId,answerArray的唯一值列表,然后计算这些独特的组合。结果需要如下所示:

{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["yes"], "count":2}
{"pickId": 1, "city": "Albany", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "New York", "state": "New York", "answerArray": ["no"], "count":1}
{"pickId": 1, "city": "Wichita", "state": "Kansas", "answerArray": ["yes"], "count":1}

我遇到的问题是mapReduce只有两个参数:

Error: fast_emit takes 2 args near...

但我希望将多个唯一值映射到一个pickId。

以下是mapReduce中的代码:

var mapFunct = function() {
if(this.answerArray == "yes"){
emit(this.pickId,1);}
else{
emit(this.pickId,0);};}

var mapReduce2 = function(keyPickId,answerVals){ 
return Array.sum(answerVals);};

db.answers.mapReduce( mapFunct, mapReduce2, { out: "mapReduceAnswers"})

非常感谢任何帮助或进一步的建议。我也研究了聚合框架,但似乎我不会得到我需要的那种输出。

1 个答案:

答案 0 :(得分:0)

我认为您可以使用聚合获取所需的格式,特别是$group$project运算符。看看这个聚合调用:

var agg_output = db.answers.aggregate([
  { $group: { _id: {
                city: "$city",
                state: "$state",
                answerArray: "$answerArray",
                pickId: "$pickId"
            }, count: { $sum: 1 }}
  },
  { $project: { city: "$_id.city", 
                state: "$_id.state", 
                answerArray: "$_id.answerArray", 
                pickId: "$_id.pickId", 
                count: "$count", 
                _id: 0}
  }
]);

db.answer_counts.insert(agg_output.result);

$group阶段负责汇总city / state / answerArray / pickId的每个唯一组合的出现,而$project阶段将数据放入您想要的表单中。

insert调用将结果输出保存到新集合中。这有意义吗?