显示按字段分组的项目

时间:2014-11-01 12:28:42

标签: javascript mongodb mapreduce mongodb-query aggregation-framework

我有这个示例项目集合:

{
  "_id": "1",
  "field1": "value1",
  "field2": "value2",
  "category": "phones",
  "user": "1",
  "tags": [
    "tag1",
    "tag3"
  ]
},
{
  "_id": "2",
  "field1": "value1",
  "field2": "value2",
  "category": "phones",
  "user": "1",
  "tags": [
    "tag2",
    "tag3"
  ]
},
{
  "_id": "3",
  "field1": "value1",
  "field2": "value2",
  "category": "bikes",
  "user": "1",
  "tags": [
    "tag3",
    "tag4"
  ]
},
{
  "_id": "4",
  "field1": "value1",
  "field2": "value2",
  "category": "cars",
  "user": "2",
  "tags": [
    "tag1",
    "tag2"
  ]
}

我会搜索特定用户创建的项目(即用户:1)并按类别字段显示它们。结果:

{
  "phones": [
      {
        "_id": "1",
        "field1": "value1",
        "field2": "value2",
        "tags": [
          "tag1",
          "tag3"
         ]
      },
      {
        "_id": "2",
        "field1": "value1",
        "field2": "value2",
        "tags": [
          "tag2",
          "tag3"
         ]
      }
  ],
  "bikes" : [
      {
        "_id": "3",
        "field1": "value1",
        "field2": "value2",
        "tags": [
          "tag3",
          "tag4"
         ]
      }
  ]

}

是否可以通过聚合组功能获得此方案? 谢谢你

1 个答案:

答案 0 :(得分:1)

可以按类别进行分组,但不能按照您提供的方式进行分组。这真是一件好事,因为你的"类别"实际上是数据,你真的不应该代表数据"作为"键",在您的存储空间或输出中。

所以我们真的建议像这样进行转换:

db.collection.aggregate([
    { "$match": { "user": 1 } },
    { "$group": {
        "_id": "$category",
        "items": { 
            "$push": {
                "field1": "$field1",
                "field2": "$field2",
                "tags": "$tags"
            }
        }
    }},
    { "$group": {
        "_id": null,
        "categories": { 
            "$push": {
                "_id": "$_id",
                "items": "$items"
            }
        }
    }}
])

你得到这样的输出:

{
    "_id" : null,
    "categories" : [
        {
            "_id" : "bikes",
            "items" : [
                {
                    "_id": 3,
                    "field1" : "value1",
                    "field2" : "value2",
                    "tags" : [
                        "tag3",
                        "tag4"
                    ]
                }
            ]
        },
        {
            "_id" : "phones",
            "items" : [
                {
                    "_id": 1,
                    "field1" : "value1",
                    "field2" : "value2",
                    "tags" : [
                        "tag1",
                        "tag3"
                    ]
                },
                {
                    "_id": 2,
                    "field1" : "value1",
                    "field2" : "value2",
                    "tags" : [
                        "tag2",
                        "tag3"
                    ]
                }
            ]
        }
    ]
}

使用不随更改数据更改的通用键名称确实更好。这实际上是面向对象的模式。

如果你真的认为你需要"数据作为键"在这里,对于聚合框架,你要么知道"类别"您期望或准备好生成管道阶段:

db.utest.aggregate([
    { "$match": { "user": "1" } },
    { "$group": {
        "_id": null,
        "phones": {
            "$push": {
                "$cond": [
                    { "$eq": ["$category","phones"] },
                    {
                        "_id": "$_id",
                        "field1": "$field1",
                        "field2": "$field2",
                        "tags": "$tags"
                    },
                    false
                ]
            }
        },
        "bikes": {
            "$push": {
                "$cond": [
                    { "$eq": ["$category","bikes"] },
                    {
                        "_id": "$_id",
                        "field1": "$field1",
                        "field2": "$field2",
                        "tags": "$tags"
                    },
                    false
                ]
            }
        }           
    }},
    { "$unwind": "$phones" },
    { "$match": { "phones": { "$ne": false } }},
    { "$group": {
        "_id": "$_id",
        "phones": { "$push": "$phones" },
        "bikes": { "$first": "$bikes" }
    }},
    { "$unwind": "$bikes" },
    { "$match": { "bikes": { "$ne": false } }},
    { "$group": {
        "_id": "$_id",
        "phones": { "$first": "$phones" },
        "bikes": { "$push": "$bikes" }
    }},
    { "$project": {
        "_id": 0,
        "phones": 1,
        "bikes": 1
    }}
])

您可以使用MongoDB 2.6缩短一点,因为您只需使用$setDifference运算符过滤掉false值:

db.collection.aggregate([
    { "$match": { "user": "1" } },
    { "$group": {
        "_id": null,
        "phones": {
            "$push": {
                "$cond": [
                    { "$eq": ["$category","phones"] },
                    {
                        "_id": "$_id",
                        "field1": "$field1",
                        "field2": "$field2",
                        "tags": "$tags"
                    },
                    false
                ]
            }
        },
        "bikes": {
            "$push": {
                "$cond": [
                    { "$eq": ["$category","bikes"] },
                    {
                        "_id": "$_id",
                        "field1": "$field1",
                        "field2": "$field2",
                        "tags": "$tags"
                    },
                    false
                ]
            }
        }           
    }},
    { "$project": {
        "_id": 0,
        "phones": { "$setDifference": ["$phones",[false]] },
        "bikes": { "$setDifference": ["$bikes",[false]] }
    }}
])

两者都可以按照您的需要生成输出:

{
    "phones" : [
        {
            "_id" : "1",
            "field1" : "value1",
            "field2" : "value2",
            "tags" : [
                "tag1",
                "tag3"
            ]
        },
        {
            "_id" : "2",
            "field1" : "value1",
            "field2" : "value2",
            "tags" : [
                "tag2",
                "tag3"
            ]
        }
    ],
    "bikes" : [
        {
            "_id" : "3",
            "field1" : "value1",
            "field2" : "value2",
            "tags" : [
                "tag3",
                "tag4"
            ]
        }
    ]
}

这里的一般情况是聚合框架只是赢了允许将字段数据用作密钥,因此您需要只对数据进行分组或自己指定密钥名称。

你获得的唯一方式"动态"键名是使用mapReduce代替:

db.collection.mapReduce(
    function () {
      var obj = { };
      var category = this.category;
      delete this.user;
      delete this.category;

      obj[category] = [this];

      emit(null,obj);
    },
    function (key,values) {

      var reduced = {};

      values.forEach(function(value) {
        Object.keys(value).forEach(function(key) {
          if ( !reduced.hasOwnProperty(key) )
            reduced[key] = [];
          value[key].forEach(function(item) {
            reduced[key].push(item);
          });
        });
      });

      return reduced;

    },
    {
        "query": { "user": "1" },
        "out": { "inline": 1 }
    }
)

所以现在密钥生成是动态的,但输出是以mapReduce的方式完成的:

{
    "_id" : null,
    "value" : {
        "phones" : [
            {
                "_id" : "1",
                "field1" : "value1",
                "field2" : "value2",
                "tags" : [
                    "tag1",
                    "tag3"
                ]
            },
            {
                "_id" : "2",
                "field1" : "value1",
                "field2" : "value2",
                "tags" : [
                    "tag2",
                    "tag3"
                ]
            }
        ],
        "bikes" : [
            {
                "_id" : "3",
                "field1" : "value1",
                "field2" : "value2",
                "tags" : [
                    "tag3",
                    "tag4"
                ]
            }
        ]
    }
}

因此,输出受mapReduce指示outut的限制,并且此处评估JavaScript将比聚合框架的本机操作慢。操纵权力更大,但这是权衡。

总而言之,如果您坚持使用模式,那么使用聚合框架的第一种方法是执行此操作的最快和最佳方式,此外,您可以始终重新构建从服务器返回的结果。如果您坚持打破模式并需要动态密钥来自服务器,那么mapReduce会在其他聚合框架被认为不切实际的情况下执行此操作。