MongoDB Distinct Count问题

时间:2016-07-15 10:08:43

标签: mongodb mapreduce

我收集了以下数据(集合包含超过1000万条记录)

> db.LogBuff.find()
{ "_id" : ObjectId("578899d5d2b76f77d083f16c"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16d"), "SUBJECT" : "AA", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16e"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f16f"), "SUBJECT" : "AA", "SYS" : "C" }
{ "_id" : ObjectId("578899d5d2b76f77d083f170"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f171"), "SUBJECT" : "BB", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f172"), "SUBJECT" : "CC", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f173"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f174"), "SUBJECT" : "CC", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f175"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f176"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f177"), "SUBJECT" : "BB", "SYS" : "C" }
{ "_id" : ObjectId("578899d5d2b76f77d083f178"), "SUBJECT" : "CC", "SYS" : "D" }
{ "_id" : ObjectId("578899d5d2b76f77d083f179"), "SUBJECT" : "DD", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17a"), "SUBJECT" : "AA", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17b"), "SUBJECT" : "BB", "SYS" : "B" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17c"), "SUBJECT" : "AA", "SYS" : "A" }
{ "_id" : ObjectId("578899d5d2b76f77d083f17d"), "SUBJECT" : "CC", "SYS" : "C" }

我想获得以下类型的输出

{ "_id" : { "SUBJECT" : "CC", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "DD", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "B" }, "COUNT" : 2 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "B" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "C" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "AA", "SYS" : "A" }, "COUNT" : 3 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "A" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "CC", "SYS" : "D" }, "COUNT" : 1 }
{ "_id" : { "SUBJECT" : "BB", "SYS" : "B" }, "COUNT" : 1 }

这是我的代码

db.LogBuff.mapReduce(     
    function(){          
        emit( { SUBJECT : this.SUBJECT, SYS : this.SYS } , this.SYS);     
    },       
    function(key,values){          
        return $count:1  <-stuck here  
    } 
)

由于某些限制,我无法使用聚合方法。我使用了以下聚合代码:

db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}},])

虽然这适用于有限数量的记录,但是对于大量记录,它会返回此错误(注意 - 我不是root用户,因此我无法更改配置):

  

断言:命令失败:{“ok”:0,“errmsg”:“排序超出内存限制104857600字节,但没有选择进入外部排序。中止操作。通过allowDiskUse:true选择加入。”, “代码”:16819}:
  聚合失败_getErrorWithCode@src/mongo/shell/utils.js:25:13

1 个答案:

答案 0 :(得分:1)

尝试使用allowDiskUse选项:

db.LogBuff.aggregate([ {"$group" : {_id:{SUBJECT:"$SUBJECT",SYS:"$SYS"},COUNT:{$sum:1}}}, {$sort:{_id:1}}], {allowDiskUse: true})