在MongoDB中是否可以对预定义数量的行进行聚合,而不是按分组。例如,我想计算每1000行的平均值,而不是按某一列分组。 下表是一个较小的示例,我要计算每4个连续行的平均评分:
所以我的休息应该喜欢这样的东西:
以下是JSON中的输入数据:
[{"ItemName":"Item1","Rating":4},
{"ItemName":"Item2","Rating":4},
{"ItemName":"Item2","Rating":4},
{"ItemName":"Item3","Rating":2},
{"ItemName":"Item4","Rating":5},
{"ItemName":"Item5","Rating":4},
{"ItemName":"Item6","Rating":2},
{"ItemName":"Item7","Rating":4},
{"ItemName":"Item8","Rating":1},
{"ItemName":"Item9","Rating":4},
{"ItemName":"Item10","Rating":3},
{"ItemName":"Item11","Rating":2},
{"ItemName":"Item12","Rating":2}]
答案 0 :(得分:0)
没有简单的方法。您将需要将整个集合分组为数组,这可能需要allowDiskUse才能处理具有巨大性能影响的大型数据集。
db.collection.aggregate([
// count all documents
{ $group: {
_id: null,
cnt: { $sum: 1},
docs: { $push: "$$ROOT" }
} },
// add _batch field to group documents by
{ $project: {
_id: 0,
docs: { $map: {
// add a sequential number to each
input: { $zip: {
inputs: [ "$docs", { $range: [ 0, "$cnt" ] } ]
} },
as: "doc",
in: { $mergeObjects: [
{ $arrayElemAt: [ "$$doc", 0 ] },
// split it in batches by 4 based on the sequential number
{ _batch: { $cond: [
{ $eq: [ { $arrayElemAt: [ "$$doc", 1 ] }, 0 ] },
1,
{ $ceil: { $divide: [ { $arrayElemAt: [ "$$doc", 1 ] }, 4 ] } }
] } }
] }
} }
} },
{ $unwind: "$docs" },
{ $replaceRoot: { newRoot: "$docs" } },
// ensure original order, only if you need ItemRange as a string
{ $sort: { _id: 1 } },
// calculate averages per batch
{ $group: {
_id: "$_batch",
start: { $first: "$ItemName" }, // only if you need ItemRange as a string
end: { $last: "$ItemName" }, // only if you need ItemRange as a string
RatingAvg: {$avg: "$Rating"}
} },
// only if you need them in order
{ $sort: { _id: 1 } },
// calculate ItemRange, only if you need ItemRange as a string
{ $project: {
_id: 0,
ItemRange: { $concat: [ "$start", "-", "$end" ] },
RatingAvg: 1
} },
])
不确定实际用例,因为当您删除例如第一个文件。
无论如何,如果您不需要格式为“ FirstName-LastName”的ItemRange并可以使用批号,则可以跳过2种内存中的持续排序,以提高性能。