MongoDB聚合计数太慢了

时间:2018-04-17 08:14:52

标签: mongodb aggregation-framework

我在users集合中有大约6万个文档,并且有以下查询:

db.getCollection('users').aggregate([
    {"$match":{"userType":"employer"}},
    {"$lookup":{"from":"companies","localField":"_id","foreignField":"owner.id","as":"company"}},
    {"$unwind":"$company"},
    {"$lookup":{"from":"companytypes","localField":"company.type.id","foreignField":"_id","as":"companyType"}},
    {"$unwind":"$companyType"},
    { $group: { _id: null, count: { $sum: 1 } } }
])

计算大约需要12秒,即使我在列表功能之前调用count函数,但我的列表函数limit: 10响应的速度比计数快。

以下是explain结果:

{
    "stages" : [ 
        {
            "$cursor" : {
                "query" : {
                    "userType" : "employer"
                },
                "fields" : {
                    "company" : 1,
                    "_id" : 1
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "jobs.users",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "userType" : {
                            "$eq" : "employer"
                        }
                    },
                    "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                            "userType" : {
                                "$eq" : "employer"
                            }
                        },
                        "direction" : "forward"
                    },
                    "rejectedPlans" : []
                }
            }
        }, 
        {
            "$lookup" : {
                "from" : "companies",
                "as" : "company",
                "localField" : "_id",
                "foreignField" : "owner.id",
                "unwinding" : {
                    "preserveNullAndEmptyArrays" : false
                }
            }
        }, 
        {
            "$match" : {
                "$nor" : [ 
                    {
                        "company" : {
                            "$eq" : []
                        }
                    }
                ]
            }
        }, 
        {
            "$group" : {
                "_id" : {
                    "$const" : null
                },
                "total" : {
                    "$sum" : {
                        "$const" : 1
                    }
                }
            }
        }, 
        {
            "$project" : {
                "_id" : false,
                "total" : true
            }
        }
    ],
    "ok" : 1.0
}

2 个答案:

答案 0 :(得分:3)

$lookup操作很慢,因为它们模仿左边的连接行为,来自DOCS

  

$ lookup在localField上执行相等匹配   来自集合

的文档中的foreignField

因此,如果用于joining集合的字段中没有索引,Mongodb将强制进行集合扫描。

foreignField属性添加索引应该可以防止集合扫描并提高性能,即使是幅度

答案 1 :(得分:-1)

@paizo的答案很好,但是当我的foreignField已经是一个_id(带有索引)并持续很长时间了吗?

这是我的查询

db.customers.aggregate([
{
  "$match": {}
},
{
  "$lookup": {
    "from": "core.entities",
    "localField": "entityId",
    "foreignField": "_id",
    "as": "entity"
  }
},
{
  "$unwind": "$entity"
},
{
  "$project": {
    "entity._id": 0
  }
},
{
  "$replaceRoot": {
    "newRoot": {
      "$mergeObjects": [
        "$entity",
        "$$ROOT"
      ]
    }
  }
},
{
  "$project": {
    "entity": 0
  }
},
{
  $facet: {
    paginatedResults: [
      {
        $skip: 0
      },
      {
        $limit: 10
      }
    ],
    totalCount: [
      {
        $count: 'count'
      }
    ]
  }
}])

这是我的客户集合索引:

[{
    "v" : 2,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "applekkus-gmp.core.customers"
},
{
    "v" : 2,
    "key" : {
        "name" : 1
    },
    "name" : "name_1",
    "ns" : "applekkus-gmp.core.customers"
}]

...这是我的实体集合索引:

[{
    "v" : 2,
    "key" : {
        "_id" : 1
    },
    "name" : "_id_",
    "ns" : "applekkus-gmp.core.entities"
}]

...这是我的汇总explain():

"stages": [
  {
    "$cursor": {
      "query": {
      },
      "queryPlanner": {
        "plannerVersion": 1,
        "namespace": "applekkus-gmp.core.customers",
        "indexFilterSet": false,
        "parsedQuery": {
        },
        "winningPlan": {
          "stage": "COLLSCAN",
          "direction": "forward"
        },
        "rejectedPlans": []
      }
    }
  },
  {
    "$lookup": {
      "from": "core.entities",
      "as": "entity",
      "localField": "entityId",
      "foreignField": "_id",
      "unwinding": {
        "preserveNullAndEmptyArrays": false
      }
    }
  },
  {
    "$project": {
      "entity": {
        "_id": false
      }
    }
  },
  {
    "$replaceRoot": {
      "newRoot": {
        "$mergeObjects": [
          "$entity",
          "$$ROOT"
        ]
      }
    }
  },
  {
    "$project": {
      "entity": false
    }
  },
  {
    "$facet": {
      "paginatedResults": [
        {
          "$limit": NumberLong(10)
        }
      ],
      "totalCount": [
        {
          "$group": {
            "_id": {
              "$const": null
            },
            "count": {
              "$sum": {
                "$const": 1
              }
            }
          }
        },
        {
          "$project": {
            "_id": false,
            "count": true
          }
        }
      ]
    }
  }
],
"ok": 1}

我的案例与@jones案例非常相似,我有一个40.000个文档,并且此汇总需要8秒才能显示仅显示10个文档(限制)的总数(40.000)。

P.S。如果我运行 customers.find()。count(),它将在不到1秒的时间内返回40.000的计数。