我试图识别mongo数据库中的错误数据,并努力使聚合正确。文件如下:
{
clientCode: 'abc',
categoryId: 123,
externalCategoryId: 'foo',
...
}
externalCategoryId是客户端提供的,对于clientCode应该是唯一的,但可以为不同的clientCode复制。
我试图识别的错误数据是,对于给定的clientCode,是否有两个不同的categoryId具有相同的externalCategoryId。
这不需要高效或在应用程序中运行,它现在只是一次性查询,以检查数据的完整性。
我已经尝试过以下文档,以及其他有关聚合/求和的建议,但还没有能够获得它。我一直走在
的道路上{
clientCode: 'abc',
externalCategoryId: 'foo',
numCategoryIds: 2
}
但我也接受其他建议。
我一直在努力进行尝试,所以我以前没有尝试过。这是查询的当前形式:
db.getCollection('funds').aggregate([
{ $group: {
_id: { clientCode: '$clientCode', externalCategoryId: '$externalCategoryId', categoryId: '$categoryId' }
}},
{ $group: {
_id: { clientCode: '$_id.clientCode', externalCategoryId: '$_id.externalCategoryId' },
categoryIds: { $sum: 1 }
}}
])
示例文件:
{ clientCode: "abc", categoryId: 1, externalCategoryId: "foo" }
{ clientCode: "xyz", categoryId: 2, externalCategoryId: "foo" }
{ clientCode: "abc", categoryId: 3, externalCategoryId: "bar" }
{ clientCode: "abc", categoryId: 4, externalCategoryId: "foo" }
预期汇总将是:
{ clientCode: "abc", externalCategoryId: "foo", numberCategoryIds: 2 }
{ clientCode: "abc", externalCategoryId: "bar", numberCategoryIds: 1 }
{ clientCode: "xyz", externalCategoryId: "foo", numberCategoryIds: 1 }
答案 0 :(得分:1)
您可以尝试运行以下聚合管道:
db.funds.aggregate([
{
"$group": {
"_id": {
"clientCode": "$clientCode",
"externalCategoryId": "$externalCategoryId"
},
"categoryIds": { "$push": "$categoryId" }
}
},
{
"$project": {
"_id": 0,
"clientCode": "$_id.clientCode",
"externalCategoryId": "$_id.externalCategoryId",
"numberCategoryIds": { "$size": "$categoryIds" }
}
}
])
示例输出
/* 1 */
{
"clientCode" : "abc",
"externalCategoryId" : "foo",
"numberCategoryIds" : 2
}
/* 2 */
{
"clientCode" : "xyz",
"externalCategoryId" : "foo",
"numberCategoryIds" : 1
}
/* 3 */
{
"clientCode" : "abc",
"externalCategoryId" : "bar",
"numberCategoryIds" : 1
}