Question

如果需要包含10个值，我需要从2个分离的嵌套数组中获取2个总和结果（10个国家的产品总和数据）。我知道我需要使用聚合函数，但我不知道。

我尝试了$ facet，但是在450万个文档（带有嵌套数组数据）中花费了大约30-40秒的时间来获得结果。（想象一下，为此我需要循环十次）

我尝试了以下解决方案，但失败了：

Multiple Counts with single query in mongodb

集合结构：

? extends vs ? super

我需要这样的结果：

{
   _id,
   sku: 'p1',
   someField,
   someField2,
   ...
   products: [
    {
    productid:132,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
    {
     productid:451,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
     {
     productid:218,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    }
   ],
   sellers: [
    {
      sellerid: 101001,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
    {
      sellerid: 104201,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
{
      sellerid: 205401,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    }
   ]
},
{
   _id,
   sku: 'x2',
   someField,
   someField2,
   ...
   products: [
    {
    productid:142,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
    {
     productid:71,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
     {
     productid:28,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    }
   ],
   sellers: [
    {
      sellerid: 1001,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
    {
      sellerid: 1421,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
{
      sellerid: 20501,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    }
   ]
},
{
   _id,
   sku: 'p3',
   someField,
   someField2,
   ...
   products: [
    {
    productid:543,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
    {
     productid:52,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
     {
     productid:32,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    }
    ...
   ],
   sellers: [
    {
      sellerid: 5201,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
    {
      sellerid: 1231,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
{
      sellerid: 12565461,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    }
   ]
}

方法和响应时间的更新

{
 countryId:5,
 productsOnCountryCount: 10102,    

/* something like count only products which has the countryId => 
$sum: { $cond: [{$eq: ['$products.countryId',2]},1,0] }
 */
 unavailableProductsCount: 3560
/* something like sellers have but not available to sell or list for some 
reason => 
$sum: {$cond: [{$and:[{$eq: ['$sellers.countryId',2]},{$ne: 
['$products.countryId',2]}]},1,0]}
*/
}

方法1（@KevinSmith）响应时间：48-50秒

var cid = 2; // assume countryId of USA
target document total = about 20 million data (including nested arrays)

方法2响应时间：36-38秒

db.test.aggregate([
  { "$facet": {
    "productsOnCountryCount": [
      { "$unwind" : "$products" },
      { "$match" : { "products.countryId": cid}},
      { "$count": "productsOnCountryCount" },
    ],
    "unavailableProductsCount": [
      { "$match" : {"sellers.countryId": cid, "products.countryId" : { $ne: cid } } },
      { "$count": "unavailableProductsCount" }
    ]
  }},
  { "$project": {
    "productsOnCountryCount": { "$arrayElemAt": ["$productsOnCountryCount.productsOnCountryCount", 0] },
    "unavailableProductsCount": { "$arrayElemAt": ["$unavailableProductsCount.unavailableProductsCount", 0] }
  }}
]);

方法3响应时间：20-21秒

db.test.aggregate([
        { "$facet": {          
          "count1": [
            { "$match" : {'products.countryId': cid }},
            { "$count": "Count" }
          ],
          "count2": [
            { "$match" : {'sellers.countryId': cid,'products.countryId':{$ne: cid} }},
            { "$count": "Count" }
          ]
        }}
      ])

根据结果，我认为我会选择方法3。感谢所有感兴趣的人

Answer 1

因此，让我们从简化数据集开始，我们将在test集合中插入项目列表：

var items = [{
  _id : 1,
  products: [
    {
      countryId: 1 
    },
    {
      countryId: 1
    },
    {
      countryId: 2
    },
    {
      countryId: 4
    },
  ],
  sellers: [
    {
      countryId: 2
    },
    {
      countryId: 2
    },
    {
      countryId: 1
    }
  ]  
},
{
  _id : 2,
  products: [
  {
    countryId: 2
  },
  {
    countryId: 2
  },
  {
    countryId: 3
  }
  ],
  sellers: [
  {
    countryId: 3
  },
  {
    countryId: 3
  },
  {
    countryId: 2
  },
  {
    countryId: 4
  }
  ]
}];

db.test.insertMany(items);

然后，我们可以使用$facet聚合阶段来处理多个聚合管道，因此，我们首先使用为productsOnCountryCount制定管道开始。

首先，我们需要展开数组中的所有products，然后根据给定的countryId进行匹配：

var countryId = 4;

db.test.aggregate([
  { "$unwind" : "$products" },
  { "$match" : { "products.countryId": countryId } }
]).pretty()
{
        "_id" : 1,
        "products" : {
                "countryId" : 4
        },
        "sellers" : [
                {
                        "countryId" : 2
                },
                {
                        "countryId" : 2
                },
                {
                        "countryId" : 1
                }
        ]
}

我们现在只需在末尾使用一个计数即可获得所有产品的计数：

db.test.aggregate([
  { "$unwind" : "$products" },
  { "$match" : { "products.countryId": countryId}},
  { "$count": "productsOnCountryCount" }])
{ "productsOnCountryCount" : 1 }

这是我们排序的第一个管道，现在让我们看一下unavailableProductsCount：

我们要做的就是匹配countryId在sellers数组中而不在products数组中的地方，这可以通过简单的$match阶段来实现，然后我们可以只需对顶部进行计数：

db.test.aggregate([
    { "$match" : {"sellers.countryId": countryId, "products.countryId" : { $ne: countryId } } },
    { "$count": "unavailableProductsCount" }])
{ "unavailableProductsCount" : 1 }

现在我们有两个管道，我们现在可以使用$facet阶段将它们连接在一起，然后将它们投影为更好的形式：

db.test.aggregate([
  { "$facet": {
    "productsOnCountryCount": [
      { "$unwind" : "$products" },
      { "$match" : { "products.countryId": countryId}},
      { "$count": "productsOnCountryCount" },
    ],
    "unavailableProductsCount": [
      { "$match" : {"sellers.countryId": countryId, "products.countryId" : { $ne: countryId } } },
      { "$count": "unavailableProductsCount" }
    ]
  }},
  { "$project": {
    "productsOnCountryCount": { "$arrayElemAt": ["$productsOnCountryCount.productsOnCountryCount", 0] },
    "unavailableProductsCount": { "$arrayElemAt": ["$unavailableProductsCount.unavailableProductsCount", 0] }
  }}
]);

{ "productsOnCountryCount" : 1, "unavailableProductsCount" : 1 }

我发现使用$facet的最佳方法是先将它们分解为较小的管道，然后最后将它们结合在一起。

如何在Mongodb中使用单个查询在多个嵌套数组上获得多个条件总和结果？

1 个答案: