如何在mongodb上查找和取消设置相同的子字段值

时间:2016-06-18 11:04:53

标签: mongodb mongodb-query

我在mongodb有100万份文件。我想找到并取消设置相同的字段。你能给我一个方法或想法吗?

我的文件是这样的:

{
        "regions" : [ 
            {"id" : "1", "name" : "World"}, 
            {"id" : "10370","name" : "South America"}, 
            {"id" : "1426","name" : "Suriname"}
        ]
    }
    {
        "regions" : [ 
            {"id" : "1", "name" : "World"}, 
            {"id" : "10370","name" : "South America"}, 
            {"id" : "1426","name" : "Suriname"}
        ]
    }
    {
        "regions" : [ 
            {"id" : "1","name" : "World"}, 
            {"id" : "1734","name" : "USA"}, 
            {"id" : "1136","name" : "Pennsylvania"}, 
            {"id" : "16962","name" : "Greater Philadelphia area"}, 
        ]
    }
    {
        "regions" : [ 
            {"id" : "1","name" : "World"}, 
            {"id" : "1734","name" : "USA"}, 
            {"id" : "1136","name" : "Pennsylvania"}, 
            {"id" : "16962","name" : "Greater Philadelphia area"}, 
        ]
    }
    {
    "regions" : [ 
        {"id" : "1","name" : "World"}, 
        {"id" : "34964","name" : "Oceania"}, 
        {"id" : "15","name" : "Australia"}, 
        {"id" : "470","name" : "Western Australia"}, 
        {"id" : "36282","name" : "Perth"}, 
      ]
   }

如何改变:

{
        "regions" : [ 
            {"id" : "1", "name" : "World"}, 
            {"id" : "10370","name" : "South America"}, 
            {"id" : "1426","name" : "Suriname"}
        ]
    }
    {
        "regions" : [ 
            {"id" : "1","name" : "World"}, 
            {"id" : "1734","name" : "USA"}, 
            {"id" : "1136","name" : "Pennsylvania"}, 
            {"id" : "16962","name" : "Greater Philadelphia area"}, 
        ]
    }
    {
"regions" : [ 
    {"id" : "1","name" : "World"}, 
    {"id" : "34964","name" : "Oceania"}, 
    {"id" : "15","name" : "Australia"}, 
    {"id" : "470","name" : "Western Australia"}, 
    {"id" : "36282","name" : "Perth"}, 
   ]
  }

感谢您的回答和提前的兴趣。

更新 我正在尝试这段代码:

db.collection.aggregate(
 {"$group":{"_id": {"id": "$regions.id","name": "$regions.name"},}},
 {"$group":{"_id":ObjectId(),"regions": { "$push": {"id": "$_id.id","name": $_id.name"}}}},
 {"$unwind": "$regions"},
 {"$out": "newcollection"}
)

它给出了这个错误: " ERRMSG" :"插入$ out失败:{connectionId:111,错误:\" E11000重复键错误集合:collection.tmp.agg_out.12索引: id dup key:{ :ObjectId(' 5767f378ff8f5e9302d95bc8')} \",代码:11000,n:0,ok:1.0}",

如何提供唯一密钥?

1 个答案:

答案 0 :(得分:0)

使用聚合,如果按数组元素分组,则可以删除重复区域。这样的事情会有所帮助吗?

db.regs.aggregate([{$group:{"_id":{id:"$regions.id",name:"$regions.name"}}}]).pretty()