聚合组多个字段

时间:2015-06-12 02:28:46

标签: mongodb aggregation-framework

给出以下数据集:

{ "_id" : 1, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 25, "Q3" : 0, "Q4" : 0 }
{ "_id" : 2, "city" : "Reno", "cat": "roads", "Q1" : 30, "Q2" : 0, "Q3" : 0, "Q4" : 60 }
{ "_id" : 3, "city" : "Yuma", "cat": "parks", "Q1" : 0, "Q2" : 0, "Q3" : 45, "Q4" : 0 }
{ "_id" : 4, "city" : "Reno", "cat": "parks", "Q1" : 35, "Q2" : 0, "Q3" : 0, "Q4" : 0 }
{ "_id" : 5, "city" : "Yuma", "cat": "roads", "Q1" : 0, "Q2" : 15, "Q3" : 0, "Q4" : 20 }

我试图达到以下结果。只返回大于零的总数会很棒,并且还会将每个城市,cat和Qx总数压缩为单个记录。

{
    "city" : "Yuma",
    "cat" : "roads",
    "Q2total" : 40
}, 
{
    "city" : "Reno",
    "cat" : "roads",
    "Q1total" : 30
},
{
    "city" : "Reno",
    "cat" : "roads",
    "Q4total" : 60
},
{
    "city" : "Yuma",
    "cat" : "parks",
    "Q3total" : 45
},
{
    "city" : "Reno",
    "cat" : "parks",
    "Q1total" : 35
},
{
    "city" : "Yuma",
    "cat" : "roads",
    "Q4total" : 20
}

可能的?

1 个答案:

答案 0 :(得分:2)

我们可以问,到底是什么?您的文档已经具有良好的一致对象结构,建议使用。拥有不同键的对象不是一个好主意。数据是"数据"并且不应该是键的名称。

考虑到这一点,聚合框架实际上遵循这种意义,不允许从文档中包含的数据生成任意键名。但是你可以得到与输出相似的结果作为数据点:

db.junk.aggregate([
    // Aggregate first to reduce the pipeline documents somewhat
    { "$group": {
        "_id": {
            "city": "$city",
            "cat": "$cat"
        },
        "Q1": { "$sum": "$Q1" },
        "Q2": { "$sum": "$Q2" },
        "Q3": { "$sum": "$Q3" },
        "Q4": { "$sum": "$Q4" }
    }},

    // Convert the "quarter" elements to array entries with the same keys
    { "$project": {
        "totals": {
            "$map": {
                "input": { "$literal": [ "Q1", "Q2", "Q3", "Q4" ] },
                "as": "el",
                "in": { "$cond": [
                    { "$eq": [ "$$el", "Q1" ] },
                    { "quarter": "$$el", "total": "$Q1" },
                    { "$cond": [
                        { "$eq": [ "$$el", "Q2" ] },
                        { "quarter": "$$el", "total": "$Q2" },
                        { "$cond": [
                           { "$eq": [ "$$el", "Q3" ] },
                           { "quarter": "$$el", "total": "$Q3" },
                           { "quarter": "$$el", "total": "$Q4" }
                        ]}
                    ]}
                ]}
            }
        }
    }},

    // Unwind the array produced
    { "$unwind": "$totals" },

    // Filter any "0" resutls
    { "$match": { "totals.total": { "$ne": 0 } } },

    // Maybe project a prettier "flatter" output
    { "$project": {
        "_id": 0,
        "city": "$_id.city",
        "cat": "$_id.cat",
        "quarter": "$totals.quarter",
        "total": "$totals.total"
    }}
])

这会给你这样的结果:

{ "city" : "Reno", "cat" : "parks", "quarter" : "Q1", "total" : 35 }
{ "city" : "Yuma", "cat" : "parks", "quarter" : "Q3", "total" : 45 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q1", "total" : 30 }
{ "city" : "Reno", "cat" : "roads", "quarter" : "Q4", "total" : 60 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q2", "total" : 40 }
{ "city" : "Yuma", "cat" : "roads", "quarter" : "Q4", "total" : 20 }

您可以使用mapReduce,它允许"某些"灵活的关键名称。但问题是你的聚合仍然是" quarter",所以你需要它作为主键的一部分,一旦发出就不能改变。

此外,你不能过滤" " 0"的任何汇总结果输出到集合后没有第二次传递,所以它对你想要做的事情并没有太多用处,除非你可以使用" transform"的第二个mapReduce操作。查询输出集合。

值得注意的是,如果你看看"第二个"在这里使用$project$map的管道阶段你会看到文档结构基本上被改变,就像你可以像原来那样交替构建你的文档,如下所示:

{
    "city" : "Reno", 
    "cat" : "parks"
    "totals" : [ 
        { "quarter" : "Q1", "total" : 35 }, 
        { "quarter" : "Q2", "total" : 0 }, 
        { "quarter" : "Q3", "total" : 0 }, 
        { "quarter" : "Q4", "total" : 0 }
    ]
},
{ 
    "city" : "Yuma", 
    "cat" : "parks"
    "totals" : [ 
        { "quarter" : "Q1", "total" : 0 }, 
        { "quarter" : "Q2", "total" : 0 }, 
        { "quarter" : "Q3", "total" : 45 }, 
        { "quarter" : "Q4", "total" : 0 } 
    ]
}

然后,聚合操作对于您的文档变得简单,结果如上所示:

db.collection.aggregate([
    { "$unwind": "$totals" },
    { "$group": {
        "_id": {
            "city": "$city",
            "cat": "$cat",
            "quarter": "$totals.quarter"
        },
        "ttotal": { "$sum": "$totals.total" }
    }},
    { "$match": { "ttotal": { "$ne": 0 } },
    { "$project": {
        "_id": 0,
        "city": "$_id.city",
        "cat": "$_id.cat",
        "quarter": "$_id.quarter",
        "total": "$ttotal"
    }}
])

因此,考虑以这种方式构建文档可能更有意义,并避免文档转换所需的任何开销。

我认为你会发现一致的密钥名称是一个更好的对象模型来编程,你应该从键值而不是键名读取数据点。如果你真的需要,那么从对象中读取数据并在后期处理中转换每个已经聚合的结果的键是一个简单的问题。