迭代集合并计算相同值出现的数量pymongo

时间:2016-12-02 11:38:57

标签: mongodb mongodb-query pymongo aggregation-framework

我在mongodb的5个集合中有类似的数据如下(文档)

{ 
   "_id" : ObjectId("53490030cf3b942d63cfbc7b"),
   "uNr" : "abdc123abcd",  
 }

我想遍历每个集合并检查任何集合中是否存在uNr匹配。如果有,则添加uNr并将+1计入新表。例如,如果3个集合中存在匹配项,则应显示{"uNr" : "abcd123", "count": "3"}

1 个答案:

答案 0 :(得分:1)

如果你的uNr值总数小到足以容纳在内存中(最多只有几百万个),你可以用计数器将它们整合到客户端并将它们存储在MongoDB集合中:

from collections import Counter
from pymongo import MongoClient, InsertOne

db = MongoClient().my_database

counts = Counter()

for collection in [db.collection1,
                   db.collection2,
                   db.collection3]:
    for doc in collection.find():
        counts[doc['uNr']] += 1

# Empty the target collection.
db.counts.delete_many({})

inserts = [InsertOne({'_id': uNr, 'n': cnt}) for uNr, cnt in counts.items()]
db.counts.bulk_write(inserts)

否则,一次查询一千个uNr值并更新单独集合中的计数:

from pymongo import MongoClient, UpdateOne, ASCENDING

db = MongoClient().my_database

# Empty the target collection.
db.counts.delete_many({})
db.counts.create_index([('uNr', ASCENDING)])

for collection in [db.collection1,
                   db.collection2,
                   db.collection3]:
    cursor = collection.find(no_cursor_timeout=True)
    # "with" statement helps ensure cursor is closed, since the server will
    # never auto-close it.
    with cursor:
        updates = []
        for doc in cursor:
            updates.append(UpdateOne({'_id': doc['uNr']},
                                     {'$inc': {'n': 1}},
                                     upsert=True))

            if len(updates) == 1000:
                db.counts.bulk_write(updates)
                updates = []

        if updates:
            # Last batch.
            db.counts.bulk_write(updates)