在聚合分组时,使用数组中最常用的值来表示重复项

时间:2017-05-22 16:10:14

标签: mongodb

我正在使用聚合管道框架。这是简化的例子。我按name属性对文档进行分组,并将city_code值推送到数组中。

这是初始收集结构:

{
  "name":"foobar",
  "address":{
    "city":"foo",
    "destination_code":"FOO"
  }
},
{
  "name":"bazfoo",
  "address":{
    "city":"foo",
    "destination_code":"FOO"
  }
},
{
  "name": "barbaz"
  "address":{
    "city":"foo",
    "destination_code":"BAR"
  }
},

我想按city对它们进行分组,并使用最常用的destination_code作为单个字符串值。

这是我的疑问:

db.cities.aggregate([
        {
          "$group": {
            "_id": "$address.city",
            "name": {
              "$first": "$address.city"
            },
            "city_code": {
              "$push": "$address.destination_code"
            }
          }
        },
        {
          "$project": {
            "_id":0,
            "name":1,
            "city_code": 1,
          }
        },
      ])

结果中的文档如下所示:

{ 
    "name" : "Ein Bokek", 
    "city_code" : [
        "TLV", 
        "JRS", 
        "JRS", 
        "JRS", 
        "JRS", 
        "JRS", 
        "JRS"
    ]
}

我知道我应该进一步聚合这个以获得具有重复数的对象数组。它应该是这样的:

{ 
    "name" : "Ein Bokek", 
    "city_code" : [
        {"value": "TLV", "count":1}, 
        {"value": "JRS", "count":6},
    ]
}

然后按计数(desc)排序使它看起来像这样:

{ 
    "name" : "Ein Bokek", 
    "city_code" : [
        {"value":"JRS", "count":6},
        {"value":"TLV", "count":1}, 

    ]
}

并最终获取第一个对象并将其转换为字符串。

{ 
    "name" : "Ein Bokek", 
    "city_code" : "JRS"
}

是否内置运算符不需要额外的步骤,可以在管道中的第一个组中使用而不是$push

1 个答案:

答案 0 :(得分:0)

此聚合遵循相关描述的步骤。基本上它按城市名称分组,然后展开目的地代码并再次分组。

db.filtered_hotel_data.aggregate([
        {
          "$group": {
            "_id": "$address.city",
            "name": {
              "$first": "$address.city"
            },
            "city_code": {
              "$push": "$destination_code"
            },
            "hotel_count": {
              "$sum": 1
            }
          }
        },
        {
          "$project": {
            "_id":0,
            "name":1,
            "city_code":1,
            "hotel_count":1,
          }
        },
        {
          "$unwind": "$city_code"
        },
        {
          "$group": {
            "_id": {
              "name": "$name",
              "city_code": "$city_code"
            },
            "count": {"$sum": 1},
            "hotel_count": {
              "$first": "$hotel_count"
            }
          }
        },
        {
          "$group": {
            "_id": "$_id.name",
            "city_code": {
              "$push": {
                "city_code": "$_id.city_code",
                "count":"$count"
              }
            },
            "hotel_count": {
              "$first": "$hotel_count"
            }
          }
        },
        {
          "$unwind": "$city_code"
        },
        {
          "$sort": {
            "city_code.count":-1
          }
        },
        {
          "$group": {
            "_id": "$_id",
            "city_code": {
              "$push": "$city_code"
            },
            "hotel_count": {
              "$first": "$hotel_count"
            }
          }
        },
        {
          "$project": {
            "_id":0,
            "name":"$_id",
            "city_code": {
              "$arrayElemAt": ["$city_code", 0]
            },
            "hotel_count":"$hotel_count",
          }
        },
        {
          "$project": {
            "_id":0,
            "name":"$name",
            "city_code": "$city_code.city_code",
            "hotel_count": "$hotel_count"
          }
        }
      ])