在保留根字段的同时对子文档进行分组/计数

时间:2019-03-04 22:51:14

标签: mongodb mongodb-query aggregation-framework

在mongodb中,经过几次$ match和$ project之后,我得到了以下2个文档。我试图弄清楚如何将每个事件的每个组中的每个团队的状态列表分组/计算在一起。简而言之,我需要知道每种状态(0、1或2)中有多少支球队。我从以下文件开始。

{0}

我希望最终会是这样的:

{ 
    "_id" : "event1", 
    "groups" : [
        {
            "_id" : "group1", 
            "wlActive" : true, 
            "teams" : [
                {"state" : NumberInt(2)}, 
                {"state" : NumberInt(2)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(0)}, 
                {"state" : NumberInt(0)} 
            ]
        }, 
        {
            "_id" : "group2", 
            "wlActive" : false, 
            "teams" : [
                {"state" : NumberInt(2)}, 
                {"state" : NumberInt(2)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(0)}, 
                {"state" : NumberInt(0)} 
            ]
        }
    ]
},
{ 
    "_id" : "event2", 
    "groups" : [
        {
            "_id" : "group3", 
            "wlActive" : true, 
            "teams" : [
                {"state" : NumberInt(2)}, 
                {"state" : NumberInt(2)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(0)}, 
                {"state" : NumberInt(0)} 
            ]
        }, 
        {
            "_id" : "group4",
            "wlActive" : false, 
            "teams" : [
                {"state" : NumberInt(2)}, 
                {"state" : NumberInt(2)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(1)}, 
                {"state" : NumberInt(0)}, 
                {"state" : NumberInt(0)} 
            ]
        }
    ]
}

并不一定要是这样,只要我可以了解每个团队状态并为每个组保留诸如“ wlActive”之类的字段即可。我在这里看到过类似的示例,但我似乎无法解决这个问题。

1 个答案:

答案 0 :(得分:1)

您实际上可以使用$addFields$project

db.collection.aggregate([
  { "$addFields": {
    "groups": {
      "$map": {
        "input": "$groups",
        "in": {
          "$mergeObjects": [
            "$$this",
            { "teams": {
              "$reduce": {
                "input": "$$this.teams",
                "initialValue": [ ],
                "in": {
                  "$cond": {
                    "if": { 
                      "$ne": [ { "$indexOfArray":  ["$$value.state", "$$this.state"] }, -1 ]
                    },
                    "then": {
                      "$concatArrays": [
                        { "$filter": {
                          "input": "$$value",
                          "as": "v",
                          "cond": { "$ne": [ "$$v.state", "$$this.state" ]  }
                        }},
                        [{
                          "state": "$$this.state",
                          "count": { "$sum": [
                            { "$arrayElemAt": [
                              "$$value.count",
                              { "$indexOfArray": ["$$value.state", "$$this.state" ] }
                            ]},
                            1
                          ]}
                        }]
                      ]
                    },
                    "else": {
                      "$concatArrays": [
                        "$$value",
                        [{ "state": "$$this.state", "count": 1 }]
                      ]
                    }
                  }
                }
              }
            }}
          ]
        }
      }
    }
  }}
])

这非常复杂,基本上使用$reduce“ inline”来代替$group管道运算符。

$reduce是工作的主要部分,因为它将每个数组项“减少”迭代到另一个数组,并在键上进行“分组”总计。它是通过$indexOfArray在当前缩减结果中寻找state的值来实现的。如果找不到(返回-1),它会通过$concatArrays附加到当前结果中,并添加新的statecount中的1。这是else情况。

当找到 (在then情况下)时,我们通过$filter从结果数组中删除匹配的元素,并连接一个新的从$indexOfArray的匹配索引中选择元素,并使用$arrayElemAt提取值。这给出了匹配元素的当前count,该元素使用$sum添加以使计数增加1

当然,传统上您当然可以使用$unwind$group语句来做到这一点:

db.collection.aggregate([
  { "$unwind": "$groups" },
  { "$unwind": "$groups.teams" },
  { "$group": {
    "_id": {
      "_id": "$_id",
      "gId": "$groups._id",
      "wlActive": "$groups.wlActive",
      "state": "$groups.teams.state"
    },
    "count": { "$sum": 1 }
  }},
  { "$sort": { "_id": -1, "count": -1 } },
  { "$group": {
    "_id": {
      "_id": "$_id._id",
      "gId": "$_id.gId",
      "wlActive": "$_id.wlActive",
    },
    "teams": { "$push": { "state": "$_id.state", "count": "$count" } }
  }},
  { "$group": {
    "_id": "$_id._id",
    "groups": {
      "$push": {
        "_id": "$_id.gId",
        "wlActive": "$_id.wlActive",
        "teams": "$teams"
      }
    }
  }}
])

这里$unwind用于将数组内容“ emflat”到成单独的文档。您可以执行以下操作,直到teams级别,然后按复合键上的$group识别唯一性state级别。< / p>

由于所有文档详细信息都是初始$group键的一部分,因此您删除了“唯一性” 级别,因此teams使用$push成为数组。为了恢复原始文档格式,在文档的原始_id值上执行了另一个$group$push重建了groups数组。

这种形式可能很容易理解,但是运行起来确实需要更长的时间,并且需要更多的资源。第一种形式是最优,因为您实际上并不需要在现有文档中$group,并且除非绝对必要,否则通常应避免使用$unwind。也就是说,有必要对所有文档中的state进行分组,但不必在单个文档中进行分组。

这两种方法基本上都返回相同的结果:

{
        "_id" : "event1",
        "groups" : [
                {
                        "_id" : "group1",
                        "wlActive" : true,
                        "teams" : [
                                {
                                        "state" : 2,
                                        "count" : 2
                                },
                                {
                                        "state" : 1,
                                        "count" : 3
                                },
                                {
                                        "state" : 0,
                                        "count" : 2
                                }
                        ]
                },
                {
                        "_id" : "group2",
                        "wlActive" : false,
                        "teams" : [
                                {
                                        "state" : 2,
                                        "count" : 2
                                },
                                {
                                        "state" : 1,
                                        "count" : 3
                                },
                                {
                                        "state" : 0,
                                        "count" : 2
                                }
                        ]
                }
        ]
}
{
        "_id" : "event2",
        "groups" : [
                {
                        "_id" : "group3",
                        "wlActive" : true,
                        "teams" : [
                                {
                                        "state" : 2,
                                        "count" : 2
                                },
                                {
                                        "state" : 1,
                                        "count" : 3
                                },
                                {
                                        "state" : 0,
                                        "count" : 2
                                }
                        ]
                },
                {
                        "_id" : "group4",
                        "wlActive" : false,
                        "teams" : [
                                {
                                        "state" : 2,
                                        "count" : 2
                                },
                                {
                                        "state" : 1,
                                        "count" : 3
                                },
                                {
                                        "state" : 0,
                                        "count" : 2
                                }
                        ]
                }
        ]
}

就其价值而言,由于这并不是跨文档真正地“聚合” ,您可以简单地返回所有数据并“聚合”客户端代码中的数组项。

以mongo shell为例:

db.collection.find().map(doc => Object.assign({}, doc, {
  _id: doc._id,
  groups: doc.groups.map(g => Object.assign({}, g, {
    _id: g._id,
    wlActive: g.wlActive,
    teams: ((input) => {
      var obj = input.reduce((o, e) => 
      (o.hasOwnProperty(e.state)) ? 
        Object.assign({} , o, { [e.state]: o[e.state]+1 })
        : Object.assign({}, o, { [e.state]: 1 }),  {});
      return Object.keys(obj)
        .map(k => ({ state: parseInt(k), count: obj[k] }))
        .sort((a,b) => b.state - a.state);
    })(g.teams)
  }))
}))