使用$ group来汇总mongodb中多个子文档的字段

时间:2014-07-21 22:18:42

标签: mongodb mongodb-query aggregation-framework

鉴于以下文件:

{
        "_id" : ObjectId("53cd79bb300ccae6b3904402"),
        "name" : "test product",
        "sku" : "product-1",
        "price" : 35,
        "cost" : 12,
        "max_cpc" : 100,
        "price_in_cents" : 3500,
        "cost_in_cents" : 1200,
        "max_cpc_in_cents" : 10000,
        "clicks" : [
                {
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "clicks" : 2,
                        "channel" : "google",
                        "campaign" : "12345687",
                        "group" : "987654321"
                },
                {
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "clicks" : 3,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "date" : ISODate("2014-04-24T00:00:00Z"),
                        "clicks" : 1,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                }
        ],
        "impressions" : [
                {
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "impressions" : 15,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "date" : ISODate("2014-04-24T00:00:00Z"),
                        "impressions" : 33,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                }
        ]
}

我想将此文档的总点击次数和总展示次数相加。我无法弄清楚如何正确设置聚合设置的管道。

最终结果是

{
    ObjectId("53cd79bb300ccae6b3904402"),
    total_clicks: 6,
    total_impressions: 48
}

1 个答案:

答案 0 :(得分:4)

这是一个相对简单的聚合操作,但如果分别对每个数组使用$unwind操作,通常需要注意的是:

db.collection.aggregate([

    // Unwind the first array
    { "$unwind": "$clicks" },

    // Sum results and keep the other array per document
    { "$group": {
        "_id": "$_id",
        "total_clicks": { "$sum": "$clicks.clicks" }
        "impressions": { "$first": "$impressions" }
    }},

    // Unwind the second array
    { "$unwind": "$impressions" },

    // Group the final result keeping the first result
    { "$group": {
        "_id": "$_id",
        "total_clicks": { "$first": "$total_clicks" },
        "total_impressions": { "$sum": "$impressions.impressions" }
    }}

])

这可以为您提供所需的结果。

{
    "_id": ObjectId("53cd79bb300ccae6b3904402"),
    "total_clicks": 6,
    "total_impressions": 48
}

$first运算符可以在此处使用,因为您在分组中按文档操作。如果您希望在所有文档或其他键中使用此功能,则执行相同操作以添加阵列,然后为其他分组级别添加最终组。

请记住单独“扩展”每个数组,否则如果同时尝试$unwind,最终会将一个元素中的每个元素重复一个元素的数量。


根据您的使用模式,您可能会考虑稍微更改架构。例如,由于此数据仅因“类型”而异,因此您可以考虑将其更改为单个“事件”数组:

{
        "_id" : ObjectId("53cd79bb300ccae6b3904402"),
        "name" : "test product",
        "sku" : "product-1",
        "price" : 35,
        "cost" : 12,
        "max_cpc" : 100,
        "price_in_cents" : 3500,
        "cost_in_cents" : 1200,
        "max_cpc_in_cents" : 10000,
        "events" : [
                {
                        "type": "click",
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "number" : 2,
                        "channel" : "google",
                        "campaign" : "12345687",
                        "group" : "987654321"
                },
                {
                        "type": "click",
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "number" : 3,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "type": "click",
                        "date" : ISODate("2014-04-24T00:00:00Z"),
                        "number" : 1,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "type": "impression", 
                        "date" : ISODate("2014-04-25T00:00:00Z"),
                        "number" : 15,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                },
                {
                        "type": "impression", 
                        "date" : ISODate("2014-04-24T00:00:00Z"),
                        "number" : 33,
                        "channel" : "google",
                        "campaign" : "8675309",
                        "group" : "9035768"
                }
        ]
}

此类更改的聚合结构如下所示:

db.collection.aggregate([

    // Unwind the events array
    { "$unwind": "$events" },

    // Group each "type" conditionally
    { "$group": {
        "_id": "$_id",
        "total_clicks": {
            "$sum": {
                "$cond": [
                    { "$eq": [ "$events.type", "click" ] },
                    "$events.number",
                    0
                ]
            }
        },
        "total_impressions": {
            "$sum": {
                "$cond": [
                    { "$eq": [ "$events.type", "impression" ] },
                    "$events.number",
                    0
                ]
            }
        }
    }}

使用$cond作为三元运算符,评估逻辑条件并选择要传递给$sum的值,具体取决于条件是true还是false

或者你可以单独聚合“类型”:

db.collection.aggregate([

    // Unwind the events array
    { "$unwind": "$events" },

    // Group each "type" conditionally
    { "$group": {
        "_id": { "_id": "$_id", "type": "$events.type" },
        "total": { "$sum": "$events.number" }
    }}

])

结果略有不同:

{
    "_id": {
        "_id": ObjectId("53cd79bb300ccae6b3904402"),
        "type": "clicks"
    },
    "total": 6
},
{
    "_id": {
        "_id": ObjectId("53cd79bb300ccae6b3904402"),
        "type": "impressions"
    },
    "total": 48
}

最后,如果你可以忍受这样的事情,比如当你添加或以其他方式更新数组成员时,你不需要在数组外的字段上进行原子更新,那么将你的“事件流”放在一个单独的集合中就会删除需要致电$unwind

{
    "sku_id" : ObjectId("53cd79bb300ccae6b3904402"),
    "name" : "test product",
    "sku" : "product-1",
    "type": "click",
    "date" : ISODate("2014-04-25T00:00:00Z"),
    "number" : 2,
    "channel" : "google",
    "campaign" : "12345687",
    "group" : "987654321"
},
{
    "sku_id" : ObjectId("53cd79bb300ccae6b3904402"),
    "name" : "test product",
    "sku" : "product-1",
    "type": "impression", 
    "date" : ISODate("2014-04-24T00:00:00Z"),
    "number" : 33,
    "channel" : "google",
    "campaign" : "8675309",
    "group" : "9035768"
}

声明:

db.eventstream.aggregate([
    { "$group": {
        "_id": "$sku_id",
        "total_clicks": {
            "$sum": {
                "$cond": [
                    { "$eq": [ "$type", "click" ] },
                    "$number",
                    0
                ]
            }
        },
        "total_impressions": {
            "$sum": {
                "$cond": [
                    { "$eq": [ "$type", "impression" ] },
                    "$number",
                    0
                ]
            }
        }
    }}
])