MongoDB:从Projection中删除重复记录

时间:2014-04-11 12:30:28

标签: python sql mongodb pymongo projection

如何从mongoDB投影中删除重复记录? 让我说我有以下形式的mongo文件 -

{"_id":"55555454", "From":"Bob", "To":"Alice", "subject":"Hi", "date":"04102011"} 
{"_id":"55555455", "From":"Bob", "To":"Dave", "subject":"Hello", "date":"04102014"}
{"_id":"55555456", "From":"Bob", "To":"Alice", "subject":"Bye", "date":"04112013"}

当我做一个简单的投影 db.col.find({}, {"From":1, "To":1, "_id"=0})

这显然会给我三个这样的记录。

  

{"来自":" Bob"," To":" Alice"} {" From": " Bob"," To":" Dave"} {"来自":" Bob",   "至":"爱丽丝"}

然而我想要的只是两条记录,这样 -

{"From":"Bob", "To":"Alice"} {"From":"Bob","To":"Dave"}

由于我的应用程序目前在python中(使用pymongo),我正在做的是我使用

从记录列表中删除应用程序中的副本
result = [dict(tupleized) for tupleized in set(tuple(item.items()) for item in l)]

是否有任何我可以应用于投影的数据库方法,只给我两条记录。

2 个答案:

答案 0 :(得分:1)

您只能使用MongoDB和投影find进行缩减并删除重复文档。

find命令无法正常工作,因为您需要记住它将光标返回给客户端,因此无法将结果减少到只有那些文档。独一无二次传球。

将其用作测试数据(删除_id):

> db.test.find()
{ "From" : "Bob", "To" : "Alice", "subject" : "Hi", "date" : "04102011" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Hi", "date" : "04102011" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "George", "To" : "Carl", "subject" : "Bye", "date" : "04112013" }
{ "From" : "David", "To" : "Carl", "subject" : "Bye", "date" : "04112013" }

您可以使用聚合:

> db.test.aggregate({ $group: { _id: { "From": "$From", "To": "$To" }}})

结果:

{
    "result" : [
        {
            "_id" : {
                    "From" : "David",
                    "To" : "Carl"
            }
        },
        {
            "_id" : {
                    "From" : "George",
                    "To" : "Carl"
            }
        },
        {
            "_id" : {
                    "From" : "Bob",
                    "To" : "Dave"
            }
        },
        {
            "_id" : {
                    "From" : "Bob",
                    "To" : "Alice"
            }
        }
],
    "ok" : 1
}

Python代码应该与上面建议的聚合管道非常相似。

答案 1 :(得分:0)

投影仅定义您希望在结果中显示的字段。这很像以下开头的陈述:

SELECT From, To

的基本形式相反
SELECT *

所以你真正想做的就是相当于:

db.collection.find(
    { "From": "Bob", "To": "Alice" },
    { "From": 1, "To": 1 }
)

实际上选择了您想要的记录,其形式与:

相同
SELECT From, To
FROM collection
WHERE
   From = "Bob"
   AND To = "Alice"

这应该以某种方式产生"重复"结果你可以使用aggregate删除它:

db.collection.aggregate([
   { "$match": {
       "From": "Bob", "To": "Alice"
   }}
   { "$group": {
       "_id": { 
           "From": "$From", "To": "$To"
       }
   }}       
])