如何从mongoDB投影中删除重复记录? 让我说我有以下形式的mongo文件 -
{"_id":"55555454", "From":"Bob", "To":"Alice", "subject":"Hi", "date":"04102011"}
{"_id":"55555455", "From":"Bob", "To":"Dave", "subject":"Hello", "date":"04102014"}
{"_id":"55555456", "From":"Bob", "To":"Alice", "subject":"Bye", "date":"04112013"}
当我做一个简单的投影
db.col.find({}, {"From":1, "To":1, "_id"=0})
这显然会给我三个这样的记录。
{"来自":" Bob"," To":" Alice"} {" From": " Bob"," To":" Dave"} {"来自":" Bob", "至":"爱丽丝"}
然而我想要的只是两条记录,这样 -
{"From":"Bob", "To":"Alice"} {"From":"Bob","To":"Dave"}
由于我的应用程序目前在python中(使用pymongo),我正在做的是我使用
从记录列表中删除应用程序中的副本result = [dict(tupleized) for tupleized in set(tuple(item.items()) for item in l)]
是否有任何我可以应用于投影的数据库方法,只给我两条记录。
答案 0 :(得分:1)
您只能使用MongoDB和投影find
进行缩减并删除重复文档。
find
命令无法正常工作,因为您需要记住它将光标返回给客户端,因此无法将结果减少到只有那些文档。独一无二次传球。
将其用作测试数据(删除_id
):
> db.test.find()
{ "From" : "Bob", "To" : "Alice", "subject" : "Hi", "date" : "04102011" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Hi", "date" : "04102011" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "George", "To" : "Carl", "subject" : "Bye", "date" : "04112013" }
{ "From" : "David", "To" : "Carl", "subject" : "Bye", "date" : "04112013" }
您可以使用聚合:
> db.test.aggregate({ $group: { _id: { "From": "$From", "To": "$To" }}})
结果:
{
"result" : [
{
"_id" : {
"From" : "David",
"To" : "Carl"
}
},
{
"_id" : {
"From" : "George",
"To" : "Carl"
}
},
{
"_id" : {
"From" : "Bob",
"To" : "Dave"
}
},
{
"_id" : {
"From" : "Bob",
"To" : "Alice"
}
}
],
"ok" : 1
}
Python代码应该与上面建议的聚合管道非常相似。
答案 1 :(得分:0)
投影仅定义您希望在结果中显示的字段。这很像以下开头的陈述:
SELECT From, To
与
的基本形式相反SELECT *
所以你真正想做的就是相当于:
db.collection.find(
{ "From": "Bob", "To": "Alice" },
{ "From": 1, "To": 1 }
)
实际上选择了您想要的记录,其形式与:
相同SELECT From, To
FROM collection
WHERE
From = "Bob"
AND To = "Alice"
这应该以某种方式产生"重复"结果你可以使用aggregate删除它:
db.collection.aggregate([
{ "$match": {
"From": "Bob", "To": "Alice"
}}
{ "$group": {
"_id": {
"From": "$From", "To": "$To"
}
}}
])