如果需要包含10个值,我需要从2个分离的嵌套数组中获取2个总和结果(10个国家的产品总和数据)。我知道我需要使用聚合函数,但我不知道。
我尝试了$ facet,但是在450万个文档(带有嵌套数组数据)中花费了大约30-40秒的时间来获得结果。 (想象一下,为此我需要循环十次)
我尝试了以下解决方案,但失败了:
How to group query with multiple $cond?
Multiple Counts with single query in mongodb
集合结构:
? extends vs ? super
我需要这样的结果:
{
_id,
sku: 'p1',
someField,
someField2,
...
products: [
{
productid:132,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
productid:451,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
productid:218,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
}
],
sellers: [
{
sellerid: 101001,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
sellerid: 104201,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
sellerid: 205401,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
}
]
},
{
_id,
sku: 'x2',
someField,
someField2,
...
products: [
{
productid:142,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
productid:71,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
productid:28,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
}
],
sellers: [
{
sellerid: 1001,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
sellerid: 1421,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
sellerid: 20501,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
}
]
},
{
_id,
sku: 'p3',
someField,
someField2,
...
products: [
{
productid:543,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
productid:52,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
productid:32,
someproductfield,
someproductfield2,
...
countryId: double <- The field which is used when sum conditon
}
...
],
sellers: [
{
sellerid: 5201,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
sellerid: 1231,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
},
{
sellerid: 12565461,
somesellerfield,
somesellerfield2,
...
countryId: double <- The field which is used when sum conditon
}
]
}
方法和响应时间的更新
{
countryId:5,
productsOnCountryCount: 10102,
/* something like count only products which has the countryId =>
$sum: { $cond: [{$eq: ['$products.countryId',2]},1,0] }
*/
unavailableProductsCount: 3560
/* something like sellers have but not available to sell or list for some
reason =>
$sum: {$cond: [{$and:[{$eq: ['$sellers.countryId',2]},{$ne:
['$products.countryId',2]}]},1,0]}
*/
}
方法1(@KevinSmith)响应时间:48-50秒
var cid = 2; // assume countryId of USA
target document total = about 20 million data (including nested arrays)
方法2响应时间:36-38秒
db.test.aggregate([
{ "$facet": {
"productsOnCountryCount": [
{ "$unwind" : "$products" },
{ "$match" : { "products.countryId": cid}},
{ "$count": "productsOnCountryCount" },
],
"unavailableProductsCount": [
{ "$match" : {"sellers.countryId": cid, "products.countryId" : { $ne: cid } } },
{ "$count": "unavailableProductsCount" }
]
}},
{ "$project": {
"productsOnCountryCount": { "$arrayElemAt": ["$productsOnCountryCount.productsOnCountryCount", 0] },
"unavailableProductsCount": { "$arrayElemAt": ["$unavailableProductsCount.unavailableProductsCount", 0] }
}}
]);
方法3响应时间:20-21秒
db.test.aggregate([
{ "$facet": {
"count1": [
{ "$match" : {'products.countryId': cid }},
{ "$count": "Count" }
],
"count2": [
{ "$match" : {'sellers.countryId': cid,'products.countryId':{$ne: cid} }},
{ "$count": "Count" }
]
}}
])
根据结果,我认为我会选择方法3。 感谢所有感兴趣的人
答案 0 :(得分:2)
因此,让我们从简化数据集开始,我们将在test
集合中插入项目列表:
var items = [{
_id : 1,
products: [
{
countryId: 1
},
{
countryId: 1
},
{
countryId: 2
},
{
countryId: 4
},
],
sellers: [
{
countryId: 2
},
{
countryId: 2
},
{
countryId: 1
}
]
},
{
_id : 2,
products: [
{
countryId: 2
},
{
countryId: 2
},
{
countryId: 3
}
],
sellers: [
{
countryId: 3
},
{
countryId: 3
},
{
countryId: 2
},
{
countryId: 4
}
]
}];
db.test.insertMany(items);
然后,我们可以使用$facet
聚合阶段来处理多个聚合管道,因此,我们首先使用为productsOnCountryCount
制定管道开始。
首先,我们需要展开数组中的所有products
,然后根据给定的countryId进行匹配:
var countryId = 4;
db.test.aggregate([
{ "$unwind" : "$products" },
{ "$match" : { "products.countryId": countryId } }
]).pretty()
{
"_id" : 1,
"products" : {
"countryId" : 4
},
"sellers" : [
{
"countryId" : 2
},
{
"countryId" : 2
},
{
"countryId" : 1
}
]
}
我们现在只需在末尾使用一个计数即可获得所有产品的计数:
db.test.aggregate([
{ "$unwind" : "$products" },
{ "$match" : { "products.countryId": countryId}},
{ "$count": "productsOnCountryCount" }])
{ "productsOnCountryCount" : 1 }
这是我们排序的第一个管道,现在让我们看一下unavailableProductsCount
:
我们要做的就是匹配countryId在sellers
数组中而不在products
数组中的地方,这可以通过简单的$match
阶段来实现,然后我们可以只需对顶部进行计数:
db.test.aggregate([
{ "$match" : {"sellers.countryId": countryId, "products.countryId" : { $ne: countryId } } },
{ "$count": "unavailableProductsCount" }])
{ "unavailableProductsCount" : 1 }
现在我们有两个管道,我们现在可以使用$facet
阶段将它们连接在一起,然后将它们投影为更好的形式:
db.test.aggregate([
{ "$facet": {
"productsOnCountryCount": [
{ "$unwind" : "$products" },
{ "$match" : { "products.countryId": countryId}},
{ "$count": "productsOnCountryCount" },
],
"unavailableProductsCount": [
{ "$match" : {"sellers.countryId": countryId, "products.countryId" : { $ne: countryId } } },
{ "$count": "unavailableProductsCount" }
]
}},
{ "$project": {
"productsOnCountryCount": { "$arrayElemAt": ["$productsOnCountryCount.productsOnCountryCount", 0] },
"unavailableProductsCount": { "$arrayElemAt": ["$unavailableProductsCount.unavailableProductsCount", 0] }
}}
]);
{ "productsOnCountryCount" : 1, "unavailableProductsCount" : 1 }
我发现使用$facet
的最佳方法是先将它们分解为较小的管道,然后最后将它们结合在一起。