我不确定如何表达这一点,但基本上我想通过子数组中的字段对文档进行分组,然后我想按父(根)文档中的字段进行分组,但保留先前的分组。
我希望有一个例子可以帮到这里。
我们说我有这些文件,其中有几个custItemNum
的信息几乎按originalFile
分组:
[
{
"items" : [
{
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
{
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
{
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
{
"recType" : "D9",
"custItemNum" : 10.0
},
{
"recType" : "D9",
"custItemNum" : 20.0
},
{
"recType" : "D9",
"custItemNum" : 30.0
}
],
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"items" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
],
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
我希望最终得到这样的集合,其中第一个分组是custItemNumber
,然后是originalFile
:
[
{
"custItemNumber" : 10.0,
"count" : 2.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
{
"recType" : "D9",
"custItemNum" : 10.0
}
]
}
]
},
{
"custItemNumber" : 20.0,
"count" : 3.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
{
"recType" : "D9",
"custItemNum" : 20.0
}
]
},
{
"originalFile" : "727557371.txt",
"item" : [
{
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
}
]
}
]
},
{
"custItemNumber" : 30.0,
"count" : 4.0,
"itemInfo" : [
{
"originalFile" : "727451921.txt",
"item" : [
{
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
{
"recType" : "D9",
"custItemNum" : 30.0
}
]
},
{
"originalFile" : "727557371.txt",
"item" : [
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
{
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
}
]
}
]
}
]
请记住,这些文档已经来自多个聚合步骤,因此没有可用的_id
字段。
到目前为止,我想出了这些聚合阶段(我手动编辑了它的输出以获得上面的结果):
{$unwind: "$items"},
{$bucket: {
groupBy: "$items.custItemNum",
boundaries: [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
output: {
count: {$sum: 1},
itemInfo: {$push: "$$ROOT"}
}
}}
导致这个结果:
[
{
"_id" : 10.0,
"count" : 2.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 2.0,
"custItemNum" : 10.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 10.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
}
]
},
{
"_id" : 20.0,
"count" : 3.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 200.0,
"custItemNum" : 20.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 20.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5290"),
"items" : {
"recType" : "I2",
"qty" : 180.0,
"custItemNum" : 20.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
},
{
"_id" : 30.0,
"count" : 4.0,
"itemInfo" : [
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "I2",
"qty" : 50.0,
"custItemNum" : 30.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae528f"),
"items" : {
"recType" : "D9",
"custItemNum" : 30.0
},
"originalFile" : "727451921.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5291"),
"items" : {
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
},
{
"_id" : ObjectId("5a7336ebb4b169272dae5292"),
"items" : {
"recType" : "I2",
"qty" : 10.0,
"custItemNum" : 30.0
},
"originalFile" : "727557371.txt",
"docId" : "278791399"
}
]
}
]
我被困在这里,想到的任何其他步骤(即$replaceRoot : { newRoot: "$itemInfo" }
)都会破坏外部分组。
另外,custItemNum
值是动态的,但AFAICT boundaries
阶段的$bucket
字段采用常量数组,因此如果有一种传递计算数组的方法在那里,我想知道如何。