Question

假设我有这样的文档集合：

  {
      "postId" : "12345",
      "blogId" : "xyz",
      "title"  : "My blog post",
      ...
      "tags"   : ["tag1", "tag2", "tag3"]
  }

我在“ blogId”和“ tags”上也有一个索引。

现在我需要在mongo shell中编写两个查询/聚合：

query1对具有相同“ blogId”的所有项目的所有不同“标签”进行计数。
query2为每个“标签”计算具有相同“ blogId”的项目

例如，假设该集合包含两个带有"blogId = "xyz"的项目：

  {
      "postId" : "12345",
      "blogId" : "xyz",
      "title"  : "My blog post 1",
      ...
      "tags"   : ["tag1", "tag2", "tag3"]
  }, 
  {
      "postId" : "67890",
      "blogId" : "xyz",
      "title"  : "My blog post 2",
      ...
      "tags"   : ["tag1", "tag3", "tag4"]
  }

在这种情况下，我希望查询能够像这样工作：

query1返回["tag1", "tag2", "tag3", "tag4"]
query2返回["tag1" : 2, "tag2" : 1, "tag3" : 2. "tag4" : 1 ]

如何建议我写这些查询？

Answer 1

您不必为此编写两个查询，只需一个具有多个阶段的管道即可确定您想要的两个结果。

在管道中，您需要作为$match管道阶段的第一步，该阶段将过滤指定字段中集合中的文档：

db.getCollection('blogs').aggregate([
    { "$match": { "blogId": "xyz" } }
])

管道的下一个阶段将是使用$unwind将标签数组展平，以便以后将它们分组：

db.getCollection('blogs').aggregate([
    { "$match": { "blogId": "xyz" } },
    { "$unwind": "$tags" }
])

一旦获得了非规范化文档，您就可以$group进行计数：

db.getCollection('blogs').aggregate([
    { "$match": { "blogId": "xyz" } },
    { "$unwind": "$tags" },
    { "$group": {
         "_id": "$tags",
         "count": { "$sum": 1 },
    } }
])

上述管道的结果可以再次通过管道传递到另一个$group阶段以整形以获得独特的标签：

db.getCollection('blogs').aggregate([
    { "$match": { "blogId": "xyz" } },
    { "$unwind": "$tags" },
    { "$group": {
         "_id": "$tags",
         "count": { "$sum": 1 },
    } },
    { "$group": {
         "_id": null,
         "query1": { "$push": "$_id" },
         "query2": { "$push": { "k": "$_id", "v": "$count" } }
    } }
])

在获取列表中包含的不同标签及其计数时，您可以将字段投影为所需格式，该格式是标签及其计数的哈希值，使用$addFields为：

db.getCollection('blogs').aggregate([
    { "$match": { "blogId": "xyz" } },
    { "$unwind": "$tags" },
    { "$group": {
         "_id": "$tags",
         "count": { "$sum": 1 },
    } },
    { "$group": {
         "_id": null,
         "query1": { "$push": "$_id" },
         "query2": { "$push": { "k": "$_id", "v": "$count" } }
    } },
    { "$addFields": {
       "query2": { "$arrayToObject": "$query2" } 
    } }
])

以上示例的输出为

{
    "_id" : null,
    "query1" : [ 
        "tag1", 
        "tag3", 
        "tag2", 
        "tag4"
    ],
    "query2" : {
        "tag4" : 1,
        "tag2" : 1,
        "tag3" : 2,
        "tag1" : 2
    }
}

要汇总所有文档，您需要删除第一个$match流水线阶段，但是如果您的集合很大，$unwind会为每个数组元素生成每个文档的副本，这将导致巨大的性能损失。并且在占用总内存的10％的聚合管道上使用更多的内存可能的内存上限，因此需要花费一些时间来整理阵列以及进行处理。因此，请注意从$unwind阶段开始管道。

如何在Mongo Shell中编写这些查询？

1 个答案: