我在mongodb的5个集合中有类似的数据如下(文档)
{
"_id" : ObjectId("53490030cf3b942d63cfbc7b"),
"uNr" : "abdc123abcd",
}
我想遍历每个集合并检查任何集合中是否存在uNr匹配。如果有,则添加uNr并将+1计入新表。例如,如果3个集合中存在匹配项,则应显示{"uNr" : "abcd123", "count": "3"}
答案 0 :(得分:1)
如果你的uNr值总数小到足以容纳在内存中(最多只有几百万个),你可以用计数器将它们整合到客户端并将它们存储在MongoDB集合中:
from collections import Counter
from pymongo import MongoClient, InsertOne
db = MongoClient().my_database
counts = Counter()
for collection in [db.collection1,
db.collection2,
db.collection3]:
for doc in collection.find():
counts[doc['uNr']] += 1
# Empty the target collection.
db.counts.delete_many({})
inserts = [InsertOne({'_id': uNr, 'n': cnt}) for uNr, cnt in counts.items()]
db.counts.bulk_write(inserts)
否则,一次查询一千个uNr值并更新单独集合中的计数:
from pymongo import MongoClient, UpdateOne, ASCENDING
db = MongoClient().my_database
# Empty the target collection.
db.counts.delete_many({})
db.counts.create_index([('uNr', ASCENDING)])
for collection in [db.collection1,
db.collection2,
db.collection3]:
cursor = collection.find(no_cursor_timeout=True)
# "with" statement helps ensure cursor is closed, since the server will
# never auto-close it.
with cursor:
updates = []
for doc in cursor:
updates.append(UpdateOne({'_id': doc['uNr']},
{'$inc': {'n': 1}},
upsert=True))
if len(updates) == 1000:
db.counts.bulk_write(updates)
updates = []
if updates:
# Last batch.
db.counts.bulk_write(updates)