Question

我有两个收藏品

文章：

{
    "_Id": "1",
    "_PostTypeId": "1",
    "_AcceptedAnswerId": "192",
    "_CreationDate": "2012-02-08T20:02:48.790",
    "_Score": "10",
    ...
    "_OwnerUserId": "6",
    ...
},
...

和用户：

{
    "_Id": "1",
    "_Reputation": "101",
    "_CreationDate": "2012-02-08T19:45:13.447",
    "_DisplayName": "Geoff Dalgas",
    ...
    "_AccountId": "2"
},
...

我希望找到写5到15个帖子的用户。这就是我的查询的样子：

db.posts.aggregate([
    {
        $lookup: {
            from: "users", 
            localField: "_OwnerUserId",
            foreignField: "_AccountId", 
            as: "X"
        }
    },  
    {
        $group: {
            _id: "$X._AccountId", 
            posts: { $sum: 1 }
        }
    },   
    {
        $match : {posts: {$gte: 5, $lte: 15}}
    },  
    {
        $sort: {posts: -1 }
    },
    {
        $project : {posts: 1}
    }
])

它的工作速度很慢。对于6k用户和10k帖子，在关系数据库中获得响应需要40秒以上，我会在一瞬间获得响应。哪里有问题？我刚刚开始使用mongodb，我很可能搞砸了这个查询。

Answer 1

来自https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/

foreignField指定from中文档的字段采集。 $ lookup在foreignField 上执行相等匹配来自输入文档的localField。如果是来自的文件集合不包含foreignField，$ lookup处理为匹配目的，值为null。

这将与任何其他查询一样执行。

如果您在字段_AccountId上没有索引，它将为10,000个帖子中的每一个执行完整的表扫描查询。大部分时间将花在该表扫描中。

db.users.ensureIndex("_AccountId", 1)

加快了进程，因此它正在执行10,000次索引命中而不是10,000次表扫描。

Answer 2

除了 bauman.space 建议将索引放在 _accountId 字段（这很关键）之外，你还应该做 $ match 阶段尽可能早地在聚合管道中（即作为第一阶段）。即使它不使用任何索引（除非您索引posts字段），它也会在执行 $ lookup （join）阶段之前过滤结果集。

您的查询非常慢的原因是，对于每个帖子，它正在为每个用户执行非索引查找（顺序读取）。那是大约60米的读数！

查看MongoDB Aggregation Docs的管道优化部分。

Answer 3

首先使用$match然后$lookup。 $match将需要检查的行过滤到$lookup。它很有效率。

Answer 4

只要您要按用户_AccountId进行分组，就应该首先通过$group进行_OwnerUserId，然后仅在过滤具有10<postsCount<15的帐户之后进行查找减少查询：

db.posts.aggregate([{
    $group: {
      _id: "$_OwnerUserId",
      postsCount: {
        $sum: 1
      },
      posts: {
        $push: "$$ROOT"
      } //if you need to keep original posts data
    }
  },
  {
    $match: {
      postsCount: {
        $gte: 5,
        $lte: 15
      }
    }
  },
  {
    $lookup: {
      from: "users",
      localField: "_id",
      foreignField: "_AccountId",
      as: "X"
    }
  },
  {
    $unwind: "$X"
  },
  {
    $sort: {
      postsCount: -1
    }
  },
  {
    $project: {
      postsCount: 1,
      X: 1
    }
  }
])

查找聚合性能差

4 个答案: