MongoDB-对文档进行分组,然后对嵌套的子文档求和

时间:2019-01-30 21:54:36

标签: mongodb mongodb-query aggregation-framework

我有一个mongo集合,其中包含一些文档,这些文档包含一些我想对字段进行汇总的子文档-下面是我希望实现的示例

每个文档的一般结构是

{
    "pool" : "Foo",
    "monthly-figures" : {
        "1": {
            "a" : 311,
            "b" : 1481,
            ...
            "x" : {"a" : 311, "b" : 19.965999999999998},
            "y" : {"a" : 200, "b" : 14.174000000000003
            }
        },
        "2": {
            "a" : 500,
            "b" : 100,
            ...
            "x" : {"a" : 123, "b" : 198},
            "y" : {"a" : 200, "b" : 13.7}
        },
        ... // May not all be present
        "12": {...}
        }
    }
}

每月将其存储为对象而不是数组的原因是某些月份可能不存在。

以三个文档为例

{
    "pool" : "Foo",
    "monthly-figures" : {
        "1": {
            "a" : 10,
            "b" : 20,
            ...
            "x" : {"a" : 15, "b" :30}
            }
        },
        "2": {
            "a" : 500,
            "b" : 100,
            ...
            "x" : {"a" : 40, "b" : 50},
        },
        "7": {
            "a": 300,
            "b": 90,
            ...
            "x": {"a": 4, "b": 5}
        }
    }
}

{
    "pool" : "Foo",
    "monthly-figures" : {
        "1": {
            "a" : 15,
            "b" : 25,
            ...
            "x" : {"a" : 20, "b" : 35},
        },
        "2": {
            "a" : 250,
            "b" : 200,
            ...
            "x" : {"a" : 60, "b" : 80},
        }
    }
}


{
    "pool" : "Bar",
    "monthly-figures" : {
        "1": {
            "a" : 300,
            "b" : 400,
            ...
            "x" : {"a" : 51, "b" : 3},
            }
        },
        "6": {
            "a" : 75,
            "b" : 135,
            ...
            "x" : {"a" : 12.5, "b" : 16},
        }
    }
}

我想通过聚合实现的是基于pool字段进行分组,然后对monthly-figures中包含的值求和-这样得到的两个文档看起来就像

{
    "pool" : "Foo",
    "monthly-figures" : {
        "1": {
            "a" : 25,
            "b" : 45,
            ...
            "x" : {"a" : 35, "b" : 65},
        },
        "2": {
            "a" : 750,
            "b" : 300,
            ...
            "x" : {"a" : 100, "b" : 130},
        },
        "7": {
            "a": 300,
            "b": 90,
            ...
            "x": {"a": 4, "b": 5}
        }
    }
}

(带有pool的Bar的文档将与只有1的文档相同)

汇总后一个月是否全为0并不重要(如果说该月在分组的任何文档中都不存在),但理想情况下不会吗?

我想出了这个有效的查询,但是我觉得这不是最好的方法-很多重复-我该如何改进?

{$group: {
        // Group to pool
        _id: "$pool",

        // Sum grouped documents
        "1a": {$sum: "$monthly-figures.1.a"},
        "1b": {$sum: "$monthly-figures.1.b"},
        ...
        "1xa": {$sum: "$monthly-figures.1.x.a"},
        "1xb": {$sum: "$monthly-figures.1.x.b"},


        "2a": {$sum: "$monthly-figures.2.a"},


        ... Continue all the way down to 12
    }
},
{$project: {
        "_id": 0,
        "pool": "$_id",

        "monthly-figures": {

            "1": {
                "a": "$1a",
                "b": "$1b",
                ...
                "x": {
                    "a": "$1xa",
                    "b": "$1xb"
                }
            },
            "2": {
                "a": "$2a",
                ...
            }

            ... Continue all the way down to 12
        }
    }
}

关于更清洁管道的任何想法? 干杯!

1 个答案:

答案 0 :(得分:0)

消除每月列出的一种方法是将每月数字对象变成一个数组。你说:

  

每月将其存储为对象而不是数组的原因是某些月份可能不存在。

但是数组仍然可以工作,因为您可以:

[{month: 1, figures: {...}}, {month: 6, figures: {...}}]

有了月度数字数组,我们可以$unwind将数组的每个元素放入其自己的文档中。现在可以对池和月份执行$group以获取总和。要收集每个池的月份,我们可以在池上另外进行$group,并将$push的特殊形式的对象(包含月份和数字)放入称为月度图形的数组中。这些对象是特殊的,因为k和v键由$arrayToObject运算符识别,该运算符在下一阶段用于恢复原始格式。这是查询:

db.colx.aggregate([{
    "$project": {
        "pool": 1,
        "monthly-figures": {"$objectToArray": "$monthly-figures"}
    }
}, {
    "$unwind": "$monthly-figures"
}, {
    "$group": {
        "_id": {
            "pool": "$pool",
            "month": "$monthly-figures.k",      
        },
        "a": {"$sum": "$monthly-figures.v.a"},
        "b": {"$sum": "$monthly-figures.v.b"},
        "x_a": {"$sum": "$monthly-figures.v.x.a"},
        "x_b": {"$sum": "$monthly-figures.v.x.b"}
    }
}, {
    "$group": {
        "_id": "$_id.pool",
        "monthly-figures": {
            "$push": {
                "k": "$_id.month",
                "v": {
                    "a": "$a",
                    "b": "$b",
                    "x": {
                        "a": "$x_a",
                        "b": "$x_b"
                    }
                }
            }
        }
    }
}, {
    "$project": {
        "_id": 0,
        "pool": "$_id",
        "monthly-figures": {"$arrayToObject": "$monthly-figures"}
    }
}])

这是查询的输出:

{
    "pool" : "Bar",
    "monthly-figures" : {
        "6" : {
            "a" : 75,
            "b" : 135,
            "x" : {
                "a" : 12.5,
                "b" : 16
            }
        },
        "1" : {
            "a" : 300,
            "b" : 400,
            "x" : {
                "a" : 51,
                "b" : 3
            }
        }
    }
}
{
    "pool" : "Foo",
    "monthly-figures" : {
        "1" : {
            "a" : 25,
            "b" : 45,
            "x" : {
                "a" : 35,
                "b" : 65
            }
        },
        "2" : {
            "a" : 750,
            "b" : 300,
            "x" : {
                "a" : 100,
                "b" : 130
            }
        },
        "7" : {
            "a" : 300,
            "b" : 90,
            "x" : {
                "a" : 4,
                "b" : 5
            }
        }
    }
}

链接到MongoDB文档: