Question

我是mongo DB的新手，并为我们的某个应用程序进行了实验。我们正在尝试实现CQRS和查询部分，我们正在尝试使用node.js和我们通过c＃实现的命令部分。

我的一个馆藏可能包含数百万份文件。我们将有一个scenarioId字段，每个方案可以有大约200万条记录。

我们的用例是比较这两个场景数据，并对场景的每个领域进行一些数学运算。例如，每个场景都可以有一个属性avgMiles，我想计算这个属性的差异，用户应该能够过滤这个差值。由于我的设计是将两个场景数据保存在单个集合中，因此我尝试按场景ID进行分组并进一步投影。

我的文档示例结构如下所示。

{ 
    "_id" : ObjectId("5ac05dc58ff6cd3054d5654c"), 
    "origin" : {
        "code" : "0000", 
    }, 
    "destination" : {
        "code" : "0001", 
    }, 
    "currentOutput" : {
        "avgMiles" : 0.15093020854848138, 
    },
    "scenarioId" : NumberInt(0), 
    "serviceType" : "ECON"
}

当我分组时，我会根据origin.code以及destination.code和serviceType属性对其进行分组。

我的聚合管道查询如下所示：

  db.servicestats.aggregate([{$match:{$or:[{scenarioId:0}, {scenarioId:1}]}},
    {$sort:{'origin.code':1,'destination.code':1,serviceType:1}},
    {$group:{
      _id:{originCode:'$origin.code',destinationCode:'$destination.code',serviceType:'$serviceType'},
          baseScenarioId:{$sum:{$switch: {
                branches: [
                  {
                    case: { $eq: [ '$scenarioId', 1] },
                    then: '$scenarioId'
                  }],
                default: 0
                  }
        }},
        compareScenarioId:{$sum:{$switch: {
                branches: [
                  {
                    case: { $eq: [ '$scenarioId', 0] },
                    then: '$scenarioId'
                  }],
                default: 0
                  }
        }},
            baseavgMiles:{$max:{$switch: {
                branches: [
                  {
                    case: { $eq: [ '$scenarioId', 1] },
                    then: '$currentOutput.avgMiles'
                  }],
                default: null
                  }
        }},
        compareavgMiles:{$sum:{$switch: {
                branches: [
                  {
                    case: { $eq: [ '$scenarioId', 0] },
                    then: '$currentOutput.avgMiles'
                  }],
                default: null
                  }
        }}
    }
    },
    {$project:{scenarioId:
      { base:'$baseScenarioId',
        compare:'$compareScenarioId'
      },
    avgMiles:{base:'$baseavgMiles', comapre:'$compareavgMiles',diff:{$subtract :['$baseavgMiles','$compareavgMiles']}}
      } 
    },
    {$match:{'avgMiles.diff':{$eq:0.5}}},
    {$limit:100}
    ],{allowDiskUse: true} )

我的小组管道阶段将有400万个文档。您能否建议我如何提高此查询的性能？

我按条件在我的组中使用的字段有一个索引，我添加了一个排序管道阶段，以帮助分组表现更好。

欢迎任何建议。

由于group by在我的情况下不起作用，我使用$ lookup实现了左外连接，查询将如下所示。

    db.servicestats.aggregate([
{$match:{$and :[ {'scenarioId':0}
  //,{'origin.code':'0000'},{'destination.code':'0001'}
  ]}},
//{$limit:1000000},
{$lookup: { from:'servicestats',
  let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
  pipeline:[
  {$match: {
                  $expr: { $and:
                       [
                         { $eq: [ "$scenarioId", 1 ] },
                         { $eq: [ "$origin.code",  "$$ocode" ] },
                         { $eq: [ "$destination.code",  "$$dcode" ] },
                         { $eq: [ "$serviceType",  "$$stype" ] },
                       ]
                    }

              }
  },
  {$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
  { $replaceRoot: { newRoot: "$comp" } }
  ],
  as : "compoutputs"
}},
{
          $replaceRoot: {
             newRoot: {
                $mergeObjects:[
                   {
                      $arrayElemAt: [
                         "$$ROOT.compoutputs",
                         0
                      ]
                   },
                   {
                      origin: "$$ROOT.origin",
                      destination: "$$ROOT.destination",
                      serviceType: "$$ROOT.serviceType",
                      baseavgmiles: "$$ROOT.currentOutput.avgMiles",
                      output: '$$ROOT'
                   }
                ]
             }
          }
       },
       {$limit:100}
])

上述查询性能良好，并在70毫秒内返回。

但是在我的场景中我需要实现一个完整的外部联接，我知道mongo目前不支持并使用$ facet管道实现如下所示

    db.servicestats.aggregate([
{$limit:1000},
{$facet: {output1:[
  {$match:{$and :[ {'scenarioId':0}
  ]}},
{$lookup: { from:'servicestats',
  let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
  pipeline:[
  {$match: {
                  $expr: { $and:
                       [
                         { $eq: [ "$scenarioId", 1 ] },
                         { $eq: [ "$origin.code",  "$$ocode" ] },
                         { $eq: [ "$destination.code",  "$$dcode" ] },
                         { $eq: [ "$serviceType",  "$$stype" ] },
                       ]
                    }

            }
  },
  {$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
  { $replaceRoot: { newRoot: "$comp" } }
  ],
  as : "compoutputs"
}},
//{
//          $replaceRoot: {
//             newRoot: {
//                $mergeObjects:[
//                   {
//                      $arrayElemAt: [
//                         "$$ROOT.compoutputs",
//                         0
//                      ]
//                   },
//                   {
//                      origin: "$$ROOT.origin",
//                      destination: "$$ROOT.destination",
//                      serviceType: "$$ROOT.serviceType",
//                      baseavgmiles: "$$ROOT.currentOutput.avgMiles",
//                      output: '$$ROOT'
//                   }
//                ]
//             }
//          }
//       }
  ],
  output2:[
    {$match:{$and :[ {'scenarioId':1}
  ]}},
{$lookup: { from:'servicestats',
  let: {ocode:'$origin.code',dcode:'$destination.code',stype:'$serviceType'},
  pipeline:[
  {$match: {
                  $expr: { $and:
                       [
                         { $eq: [ "$scenarioId", 0 ] },
                         { $eq: [ "$origin.code",  "$$ocode" ] },
                         { $eq: [ "$destination.code",  "$$dcode" ] },
                         { $eq: [ "$serviceType",  "$$stype" ] },
                       ]
                    }

            }
  },
  {$project: {_id:0, comp :{compavgmiles :'$currentOutput.avgMiles'}}},
  { $replaceRoot: { newRoot: "$comp" } }
  ],
  as : "compoutputs"
}},
//{
//          $replaceRoot: {
//             newRoot: {
//                $mergeObjects:[
//                   {
//                      $arrayElemAt: [
//                         "$$ROOT.compoutputs",
//                         0
//                      ]
//                   },
//                   {
//                      origin: "$$ROOT.origin",
//                      destination: "$$ROOT.destination",
//                      serviceType: "$$ROOT.serviceType",
//                      baseavgmiles: "$$ROOT.currentOutput.avgMiles",
//                      output: '$$ROOT'
//                   }
//                ]
//             }
//          }
//       },
       {$match :{'compoutputs':{$eq:[]}}}

  ]
  }
}




       ///{$limit:100}
])

但是平面表现非常糟糕。任何进一步改善这一点的想法都是受欢迎的。

Answer 1

通常，有三件事可能导致查询速度慢：

查询未编入索引，无法有效使用索引，或架构设计不是最佳的（例如，高度嵌套的数组或子文档），这意味着MongoDB必须做一些额外的工作才能获得相关数据。
查询正在等待一些缓慢的事情（例如从磁盘获取数据，将数据写入磁盘）。
供应不足的硬件。

就您的查询而言，可能会有一些关于查询效果的一般性建议：

在聚合管道中使用allowDiskUse意味着查询可能会在某些阶段使用磁盘。磁盘通常是机器中最慢的部分，因此如果您可以避免这种情况，它将加快查询速度。
请注意，聚合查询的内存使用限制为100MB。这与您拥有的内存量无关。
$group阶段无法使用索引，因为索引与文档在磁盘上的位置相关联。一旦聚合管道进入文档的物理位置无关的阶段（例如$group阶段），就不能再使用索引。
默认情况下，WiredTiger缓存大约是RAM的50％，因此64GB的机器会有~32GB的WiredTiger缓存。如果您发现查询速度很慢，则MongoDB可能需要转到磁盘才能获取相关文档。在查询期间监视iostats并检查磁盘利用率％将提供有关是否配置足够RAM的提示。

一些可能的解决方案是：

提供更多内存，以便MongoDB不必经常访问磁盘。
重新设计架构设计，以避免重度嵌套字段或文档中的多个数组。
定制文档架构，使您更容易查询其中的数据，而不是根据您认为数据的存储方式定制架构（例如，避免关系数据库设计模型中固有的严重规范化）。
如果您发现自己达到了单台机器的性能限制，请考虑使用分片来水平缩放查询。但请注意，分片是一种需要仔细设计和考虑的解决方案。

Answer 2

您在上面说过，您希望按scenarioId分组，但是，您不这样做。但这可能是你应该做的，以避免所有的switch语句。这样的事情可能会让你前进：

db.servicestats.aggregate([{
    $match: {
        scenarioId: { $in: [ 0, 1 ] }
    }
}, {
    $sort: { // not sure if that stage even helps - try to run with and without
        'origin.code': 1,
        'destination.code': 1,
        serviceType: 1
    }
}, {
    $group: { // first group by scenarioId AND the other fields
        _id: {
            scenarioId: '$scenarioId',
            originCode: '$origin.code',
            destinationCode: '$destination.code',
            serviceType: '$serviceType'
        },
        avgMiles: { $max: '$currentOutput.avgMiles' } // no switches needed
    },
}, {
    $group: { // group by the other fields only so without scenarioId
        _id: {
            originCode: '$_id.originCode',
            destinationCode: '$_id.destinationCode',
            serviceType: '$_id.serviceType'
        },
        baseScenarioAvgMiles: {
            $max: {
                $cond: {
                    if: { $eq: [ '$_id.scenarioId', 1 ] },
                    then: '$avgMiles',
                    else: 0
                }
            }
        },
        compareScenarioAvgMiles: {
            $max: {
                $cond: {
                    if: { $eq: [ '$_id.scenarioId', 0 ] },
                    then: '$avgMiles',
                    else: 0
                }
            }
        }
    },
}, {
    $addFields: { // compute the difference
        diff: {
            $subtract :[ '$baseScenarioAvgMiles', '$compareScenarioAvgMiles']
        }
    }
}, {
    $match: {
        'avgMiles.diff': { $eq: 0.5 }
    }
}, {
    $limit:100
}], { allowDiskUse: true })

除此之外，我建议您使用db.collection.explain().aggregate(...)的强大功能来查找正确的索引并调整查询。

Mongo DB聚合组性能

2 个答案: