I have Mongo documents which have array number values in order (it's by day) and I want to sum the same values across multiple documents for each position grouped by field outside of the array.
{"_id" : "1",
"group" : "A",
"value_list" : [1,2,3,4,5,6,7]
},
{"_id" : "2",
"group" : "B",
"value_list" : [10,20,30,40,50,60,70]
},
{"_id" : "3",
"group" : "A",
"value_list" : [1,2,3,4,5,6,7]
},
{"_id" : "4",
"group" : "B",
"value_list" : [10,20,30,40,50,60,70]
}
So the results I'm after is listed below.
There are two group A documents above and at position 1 of the value_list array, both documents have the value of 1. so 1+1=2. Position 2 the value is 2 in both documents so 2+2=4, etc.
There are two group B documents above and at position 1 of the value_list array, both documents have the value of 10. so 10+10=20. Position 2 the value is 20 in both documents so 20+20=40, etc.
{"_id" : "30",
"group" : "A",
"value_list" : [2,4,6,8,10,12,14]
},
{"_id" : "30",
"group" : "A",
"value_list" : [20,40,60,80,100,120,140]
}
How would I do this using Mongo Script? Thanks, Matt
答案 0 :(得分:1)
当然,最“可扩展”的方式是使用$unwind
的includeArrayIndex
选项来跟踪位置,然后$sum
“展开”组合,然后再添加回数组格式:
db.getCollection('test').aggregate([
{ "$unwind": { "path": "$value_list", "includeArrayIndex": "index" } },
{ "$group": {
"_id": {
"group": "$group",
"index": "$index"
},
"value_list": { "$sum": "$value_list" }
}},
{ "$sort": { "_id": 1 } },
{ "$group": {
"_id": "$_id.group",
"value_list": { "$push": "$value_list" }
}},
{ "$sort": { "_id": 1 } }
])
请注意,在第一个$sort
后需要$group
才能维持阵列位置。
如果您可以使用它,您也可以将所有数组应用到$reduce
:
db.getCollection('test').aggregate([
{ "$group": {
"_id": "$group",
"value_list": { "$push": "$value_list" }
}},
{ "$addFields": {
"value_list": {
"$reduce": {
"input": "$value_list",
"initialValue": [],
"in": {
"$map": {
"input": {
"$zip": {
"inputs": ["$$this", "$$value"],
"useLongestLength": true,
}
},
"in": { "$sum": "$$this"}
}
}
}
}
}},
{ "$sort": { "_id": 1 } }
])
基本上,您使用初始$push
创建“数组数组”,并使用$reduce
处理该数组。 $zip
为每个元素执行“成对”分配,然后使用$map
在$sum
期间在每个位置将它们添加到一起。
虽然效率稍高,但对于大数据来说并不实用,因为在“减少”它之前,通过将所有分组的“数组”添加到分组中的单个数组中,您可能会破坏BSON限制。
任何一种方法都会产生相同的结果:
/* 1 */
{
"_id" : "A",
"value_list" : [
2.0,
4.0,
6.0,
8.0,
10.0,
12.0,
14.0
]
}
/* 2 */
{
"_id" : "B",
"value_list" : [
20.0,
40.0,
60.0,
80.0,
100.0,
120.0,
140.0
]
}