过滤mongo文档 - python

时间:2016-12-16 08:14:13

标签: python mongodb mongodb-query aggregation-framework

这是mongo文档的视图。我想只保留文档中键值最长的条目。这里的键值是一个字符串,所以具有最长字符串长度的键应该只保留

{ 
    "_id" : ObjectId("585a431415c7a981b47ac4ee"), 
    "key" : "http://www.adnansami.com", 
    "value" : "A"
}
{ 
    "_id" : ObjectId("585a431415c7a981b47ac4ef"), 
    "key" : "http://www.leap-networks.com", 
    "value" : "BB"
}
{ 
    "_id" : ObjectId("585a431d15c7a981b47ac4f0"), 
    "key" : "http://www.leap-networks.com", 
    "value" : "B"
}
{ 
    "_id" : ObjectId("585a431d15c7a981b47ac4f1"), 
    "key" : "http://www.machinelearningmastery.com", 
    "value" : "C"
}
{ 
    "_id" : ObjectId("585a432515c7a981b47ac4f2"), 
    "key" : "http://www.leap-networks.com", 
    "value" : "BBB"
}
{ 
    "_id" : ObjectId("585a432815c7a981b47ac4f3"), 
    "key" : "http://www.machinelearningmastery.com", 
    "value" : "CC"
}
{ 
    "_id" : ObjectId("585a432d15c7a981b47ac4f4"), 
    "key" : "http://www.leap-networks.com", 
    "value" : "BBBB"
}
{ 
    "_id" : ObjectId("585a433115c7a981b47ac4f5"), 
    "key" : "http://www.machinelearningmastery.com", 
    "value" : "CCC"
}
{ 
    "_id" : ObjectId("585a433615c7a981b47ac4f6"), 
    "key" : "http://www.leap-networks.com", 
    "value" : "BBBBB"
}
{ 
    "_id" : ObjectId("585a433d15c7a981b47ac4f7"), 
    "key" : "http://www.machinelearningmastery.com", 
    "value" : "CCCC"
}
{ 
    "_id" : ObjectId("585a434915c7a981b47ac4f8"), 
    "key" : "http://www.machinelearningmastery.com", 
    "value" : "CCCCC"
}

所以输出应该是

  { 
   "_id" : ObjectId("58539dc715c7a964817686f9"), 
   "http://www.adnansami.com" : "A "
  }
  { 
    "_id" : ObjectId("585a433615c7a981b47ac4f6"), 
    "key" : "http://www.leap-networks.com", 
    "value" : "BBBBB"
  }
  { 
    "_id" : ObjectId("585a434915c7a981b47ac4f8"), 
    "key" : "http://www.machinelearningmastery.com", 
    "value" : "CCCCC"
  }

我怎样才能实现这个目标?

1 个答案:

答案 0 :(得分:2)

由于哈希键,在没有求助于某些map-reduce操作的情况下在mongo中查询会非常复杂。 Mongo在嵌入式结构中运行良好,您可以使用

这样的键/值文档
{ 
    "_id" : ObjectId("58539dfa15c7a96481768700"),        
    "key": "http://www.leap-networks.com", 
    "value": "AAAAAAAA" 
} 

相反,所以你应该考虑重构你的文档,使其在MongoDB中可以索引并更容易搜索。

对于上面提出的架构,您可以应用聚合框架,您可以在Mongodb 3.4中使用 $strLenCP 运算符来计算值字段的长度:

db.collection.aggregate([
    {
        "$addFields": {
            "strLength": { 
                "$strLenCP": "$value"
            }
        }
    },
    { "$sort": { "strLength": -1 } },
    {
        "$group": {
            "_id": "$key",
            "value": { "$first": "$value" },
            "doc_id": { "$first": "$_id" }          
        }
    }    
])

示例输出

{ 
    "doc_id": ObjectId("58539dc715c7a964817686f9"),
    "_id" : "http://www.adnansami.com", 
    "value":  "A "      
},
{ 
    "doc_id": ObjectId("58539dd515c7a964817686fc"),
    "_id" : "http://www.movies.yahoo.com",
    "value": "AAAA"     
},
{ 
    "doc_id": ObjectId("58539dfa15c7a96481768700"),
    "_id" : "http://www.leap-networks.com", 
    "value": "AAAAAAAA"     
}