如何在Mongodb中使用单个查询在多个嵌套数组上获得多个条件总和结果?

时间:2019-01-17 08:24:48

标签: mongodb count

如果需要包含10个值,我需要从2个分离的嵌套数组中获取2个总和结果(10个国家的产品总和数据)。我知道我需要使用聚合函数,但我不知道。

我尝试了$ facet,但是在450万个文档(带有嵌套数组数据)中花费了大约30-40秒的时间来获得结果。 (想象一下,为此我需要循环十次)

我尝试了以下解决方案,但失败了:

How to group query with multiple $cond?

Multiple Counts with single query in mongodb

集合结构:

? extends vs ? super

我需要这样的结果:

{
   _id,
   sku: 'p1',
   someField,
   someField2,
   ...
   products: [
    {
    productid:132,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
    {
     productid:451,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
     {
     productid:218,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    }
   ],
   sellers: [
    {
      sellerid: 101001,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
    {
      sellerid: 104201,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
{
      sellerid: 205401,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    }
   ]
},
{
   _id,
   sku: 'x2',
   someField,
   someField2,
   ...
   products: [
    {
    productid:142,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
    {
     productid:71,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
     {
     productid:28,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    }
   ],
   sellers: [
    {
      sellerid: 1001,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
    {
      sellerid: 1421,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
{
      sellerid: 20501,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    }
   ]
},
{
   _id,
   sku: 'p3',
   someField,
   someField2,
   ...
   products: [
    {
    productid:543,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
    {
     productid:52,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    },
     {
     productid:32,
      someproductfield,
      someproductfield2,
      ...
      countryId: double  <- The field which is used when sum conditon         
    }
    ...
   ],
   sellers: [
    {
      sellerid: 5201,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
    {
      sellerid: 1231,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    },
{
      sellerid: 12565461,
      somesellerfield,
      somesellerfield2,
      ...
      countryId: double  <- The field which is used when sum conditon 
    }
   ]
}

方法和响应时间的更新

{
 countryId:5,
 productsOnCountryCount: 10102,    

/* something like count only products which has the countryId => 
$sum: { $cond: [{$eq: ['$products.countryId',2]},1,0] }
 */
 unavailableProductsCount: 3560
/* something like sellers have but not available to sell or list for some 
reason => 
$sum: {$cond: [{$and:[{$eq: ['$sellers.countryId',2]},{$ne: 
['$products.countryId',2]}]},1,0]}
*/
}  
  

方法1(@KevinSmith)响应时间:48-50秒

var cid = 2; // assume countryId of USA
target document total = about 20 million data (including nested arrays)
  

方法2响应时间:36-38秒

db.test.aggregate([
  { "$facet": {
    "productsOnCountryCount": [
      { "$unwind" : "$products" },
      { "$match" : { "products.countryId": cid}},
      { "$count": "productsOnCountryCount" },
    ],
    "unavailableProductsCount": [
      { "$match" : {"sellers.countryId": cid, "products.countryId" : { $ne: cid } } },
      { "$count": "unavailableProductsCount" }
    ]
  }},
  { "$project": {
    "productsOnCountryCount": { "$arrayElemAt": ["$productsOnCountryCount.productsOnCountryCount", 0] },
    "unavailableProductsCount": { "$arrayElemAt": ["$unavailableProductsCount.unavailableProductsCount", 0] }
  }}
]);
  

方法3响应时间:20-21秒

db.test.aggregate([
        { "$facet": {          
          "count1": [
            { "$match" : {'products.countryId': cid }},
            { "$count": "Count" }
          ],
          "count2": [
            { "$match" : {'sellers.countryId': cid,'products.countryId':{$ne: cid} }},
            { "$count": "Count" }
          ]
        }}
      ])

根据结果,我认为我会选择方法3。 感谢所有感兴趣的人

1 个答案:

答案 0 :(得分:2)

因此,让我们从简化数据集开始,我们将在test集合中插入项目列表:

var items = [{
  _id : 1,
  products: [
    {
      countryId: 1 
    },
    {
      countryId: 1
    },
    {
      countryId: 2
    },
    {
      countryId: 4
    },
  ],
  sellers: [
    {
      countryId: 2
    },
    {
      countryId: 2
    },
    {
      countryId: 1
    }
  ]  
},
{
  _id : 2,
  products: [
  {
    countryId: 2
  },
  {
    countryId: 2
  },
  {
    countryId: 3
  }
  ],
  sellers: [
  {
    countryId: 3
  },
  {
    countryId: 3
  },
  {
    countryId: 2
  },
  {
    countryId: 4
  }
  ]
}];

db.test.insertMany(items);

然后,我们可以使用$facet聚合阶段来处理多个聚合管道,因此,我们首先使用为productsOnCountryCount制定管道开始。

首先,我们需要展开数组中的所有products,然后根据给定的countryId进行匹配:

var countryId = 4;

db.test.aggregate([
  { "$unwind" : "$products" },
  { "$match" : { "products.countryId": countryId } }
]).pretty()
{
        "_id" : 1,
        "products" : {
                "countryId" : 4
        },
        "sellers" : [
                {
                        "countryId" : 2
                },
                {
                        "countryId" : 2
                },
                {
                        "countryId" : 1
                }
        ]
}

我们现在只需在末尾使用一个计数即可获得所有产品的计数:

db.test.aggregate([
  { "$unwind" : "$products" },
  { "$match" : { "products.countryId": countryId}},
  { "$count": "productsOnCountryCount" }])
{ "productsOnCountryCount" : 1 }

这是我们排序的第一个管道,现在让我们看一下unavailableProductsCount

我们要做的就是匹配countryId在sellers数组中而不在products数组中的地方,这可以通过简单的$match阶段来实现,然后我们可以只需对顶部进行计数:

db.test.aggregate([
    { "$match" : {"sellers.countryId": countryId, "products.countryId" : { $ne: countryId } } },
    { "$count": "unavailableProductsCount" }])
{ "unavailableProductsCount" : 1 }

现在我们有两个管道,我们现在可以使用$facet阶段将它们连接在一起,然后将它们投影为更好的形式:

db.test.aggregate([
  { "$facet": {
    "productsOnCountryCount": [
      { "$unwind" : "$products" },
      { "$match" : { "products.countryId": countryId}},
      { "$count": "productsOnCountryCount" },
    ],
    "unavailableProductsCount": [
      { "$match" : {"sellers.countryId": countryId, "products.countryId" : { $ne: countryId } } },
      { "$count": "unavailableProductsCount" }
    ]
  }},
  { "$project": {
    "productsOnCountryCount": { "$arrayElemAt": ["$productsOnCountryCount.productsOnCountryCount", 0] },
    "unavailableProductsCount": { "$arrayElemAt": ["$unavailableProductsCount.unavailableProductsCount", 0] }
  }}
]);

{ "productsOnCountryCount" : 1, "unavailableProductsCount" : 1 }

我发现使用$facet的最佳方法是先将它们分解为较小的管道,然后最后将它们结合在一起。