$ group阶段聚合后在_id上匹配

时间:2017-01-04 09:00:46

标签: mongodb

我在MongoDB中有以下场景:

每条记录都有自己的_id和parentId。如果parentId ==“”则它是真正的父记录。如果parentId具有值,则该记录实际上是指向父记录的子记录。以下显示了一个父项及其链接子项。

{"_id": ObjectId('586c9d275d2f62e1634978db'), parentId="", count=1, <other fields>}
{"_id": ObjectId('586c9d275d2f62e163497811'), parentId=ObjectId('586c9d275d2f62e1634978db'), count=3, <other fields>}

我想要一个查询,在那里我找到按计数字段排序的所有父记录,其中所有父记录和子记录被分组在一起。例如,最容易通过图表解释:

enter image description here

ID6具有与父ID5关联的最高计数值。下一个最高计数是ID2,它与父ID1相关联,最后ID4是父项,也应该返回,因此结果应为:

ID5,ID1,ID4

HoefMeistert帮我提出了以下问题:

MongoDB sorting on children

db.collection.aggregate(
  [
    {
      $project: {
      group_id : { $cond : { if: { $ne: [ "$parentId", "" ] }, then: "$parentId", else: "$_id" }},
      count :1,
      field1:1,
      field2:1
      }
    },
    {
      $group: {
      _id : "$group_id",
      highest : { $max: "$count" }
      },
      "field1":{"$first":"$field1"},
      "field2":{"$first":"$field2"},
    },
    {
      $sort: {
      highest : -1
      }
    }
  ]
);

此查询的问题在于它不返回与父项I.e.相关联的field1和field2。图中的ID1和ID5。有没有办法在小组阶段投射与父母相关的正确字段?否则,如果小组阶段返回如下内容:

{'_id': ObjectId('586c9d275d2f62e1634978db'), 'highest': 2}
{'_id': ObjectId('586c9d0d5d2f62e1634978d5'), 'highest': 1}
{'_id': ObjectId('586c9d365d2f62e1634978e3'), 'highest': 0}

我怎样才能在小组之后重新拉回上面所有ID的整个记录​​?即586c9d275d2f62e1634978db,586c9d0d5d2f62e1634978d5,586c9d365d2f62e1634978e3 ??

1 个答案:

答案 0 :(得分:1)

您的查询有误,field1field2需要在$group字典内:

db.collection.aggregate([
    {
      $project: {
          group_id: { $cond: { if: { $ne: [ "$parentId", "" ] }, then: "$parentId", else: "$_id" }},
          count: 1,
          field1: 1,
          field2: 1
      }
    },
    {
      $group: {
        _id: "$group_id",
        highest: { $max: "$count"},
        field1: { "$first": "$field1"},
        field2: { "$first":" $field2"},
      },
    },
    {
      $sort: {
        highest : -1
      }
    }
]);

结果基于您的图表:

{ "_id" : "5", "highest" : 5, "field1" : ..., "field2" : ... }
{ "_id" : "1", "highest" : 3, "field1" : ..., "field2" : ... }
{ "_id" : "4", "highest" : 1, "field1" : ..., "field2" : ... }

编辑:

db.collection.aggregate([
    {
        $project: {
            group_id: { $cond: { if: { $ne: [ "$parentId", "" ] }, then: "$parentId", else: "$_id" }},
            count: 1,
            field1: { $cond: { if: { $ne: [ "$parentId", "" ] }, then: null, else: "$field1" }},
            field2: { $cond: { if: { $ne: [ "$parentId", "" ] }, then: null, else: "$field2" }},
        }
    },
    {
        $group: {
            _id: "$group_id",
            highest: { $max: "$count"},
            field1: { "$max": "$field1"},
            field2: { "$max":"$field2"},
        },
    },
    {
        $sort: {
            highest : -1
        }
    }
]);

通过此修改,在小组阶段中,只有父级的值为field1field2,其他文档的值为null。我们可以做$max个,以获得唯一的值,父值。

结果与上述相同,field1field2将包含来自父文档的值