Mongo聚合计数值的实例

时间:2016-11-30 10:18:38

标签: javascript mongodb mongoose mongodb-query aggregation-framework

我有一组(~35k)文档,如下所示:

{
    "_id" : ObjectId("583dabfc7572394f93ac6ef2"),
    "updatedAt" : ISODate("2016-11-29T16:25:32.130Z"),
    "createdAt" : ISODate("2016-11-29T16:25:32.130Z"),
    "sourceType" : "report",
    "sourceRef" : ObjectId("583da865686e3dfbd977f059"),
    "type" : "video",
    "caption" : "lorem ipsum",
    "timestamps" : {
        "postedAt" : ISODate("2016-08-26T15:09:35.000Z"),
        "monthOfYear" : 7, // 0-based
        "dayOfWeek" : 5, // 0-based
        "hourOfDay" : 16 // 0-based
    },
    "stats" : {
        "comments" : 0,
        "likes" : 8
    },
    "user" : {
        "id" : "123456",
        "username" : "johndoe",
        "fullname" : "John",
        "picture" : ""
    },
    "images" : {
        "thumbnail" : "",
        "low" : "",
        "standard" : ""
    },
    "mentions" : [
        "janedoe"
    ],
    "tags" : [ 
        "holiday", 
        "party"
    ],
    "__v" : 0
}

我想生成一个汇总报告,该报告将用于按一周中每周/每天的一小时来计算文档的频率,以及提及/标记的计数。

{
  // Each frequency is independant from the others,
  // e.g. the total count for each frequency should
  // be ~35k.
  dayFrequency: [
    { day: 0, count: 1400 }, // Monday
    { day: 1, count: 1700 }, // Tuesday
    { day: 2, count: 1800 }, // Wednesday
    { /* etc */ },
    { day: 6, count: 1200 }  // Sunday
  ],

  monthFrequency: [
    { month: 0, count: 200 }, // January
    { month: 1, count: 250 }, // February
    { month: 2, count: 300 }, // March
    { /* etc */ },
    { month: 11, count: 150 } // December
  ],

  hourFrequency: [
    { hour: 0, count: 150 }, // 0am
    { hour: 1, count: 200 }, // 1am
    { hour: 2, count: 275 }, // 2am
    { /* etc */ },
    { hour: 23, count: 150 }, // 11pm
  ],

  mentions: {
    janedoe: 12,
    johnsmith: 11,
    peter: 54,
    /* and so on */
  },

  tags: {
    holiday: 872,
    party: 1029,
    /* and so on */
  }
}

这是可能的,如果可以的话,我该如何写呢?根据我的理解,当我执行所有匹配文档的汇总时,它实际上是一个组?

到目前为止,我的代码只是将所有匹配的记录分组到一个组中,但我不确定如何继续前进。

Model.aggregate([
  { $match: { sourceType: 'report', sourceRef: '583da865686e3dfbd977f059' } },
  { $group: { 
    _id: '$sourceRef'
  }}
], (err, res) => {
  console.log(err);
  console.log(res);
})

同样可以接受的是将频率计为一个计数数组(例如[ 1400, 1700, 1800, /* etc */ 1200 ]),这会让我看看$count和其他几个运算符,但是我再也不是明确用法。

1 个答案:

答案 0 :(得分:1)

目前不可能(在撰写本文时)在单个管道中使用MongoDB 3.2执行此操作。但是,从MongoDB 3.4及更高版本开始,您可以使用 $facet 运算符,该运算符允许在同一组输入文档的单个阶段中处理多个聚合管道。每个子管道在输出文档中都有自己的字段,其结果存储为文档数组。

例如,可以通过运行以下聚合管道来实现上述内容:

Model.aggregate([
    { "$match": { "sourceType": "report", "sourceRef": "583da865686e3dfbd977f059" } },
    {
        "$facet": {
            "dayFrequency": [
                {
                    "$group": {
                        "_id": "$timestamps.dayOfWeek",
                        "count": { "$sum": 1 }
                    }
                }
            ],
            "monthFrequency": [
                {
                    "$group": {
                        "_id": "$timestamps.monthOfYear",
                        "count": { "$sum": 1 }
                    }
                }
            ],
            "hourFrequency": [
                {
                    "$group": {
                        "_id": "$timestamps.hourOfDay",
                        "count": { "$sum": 1 }
                    }
                }
            ],
            "mentions": [
                { "$unwind": "$mentions" },
                {
                    "$group": {
                        "_id": "$mentions",
                        "count": { "$sum": 1 }
                    }
                }
            ],
            "tags": [
                { "$unwind": "$tags" },
                {
                    "$group": {
                        "_id": "$tags",
                        "count": { "$sum": 1 }
                    }
                }
            ]
        }
    }
], (err, res) => {
    console.log(err);
    console.log(res);
})