Question

我正在处理一些文本文件以搜索模式并对其进行计数。由于文件非常大，因此处理时间很重要。我有一个python代码，可更新计数器并将其存储在mongodb中。为了使其运行更快，我正在尝试减少数据库操作的数量。

原始版本每增加一次出现：

mlcol.find_one_and_update(
    {"connip": conip}, 
    {"$inc":{ts:1}}, 
    upsert=True
)

这花了很长时间，我要做的是将计数器保存在内存中，保存在字典中，并定期遍历这些数据来存储它：

for conip in conCounter.keys():
    d = conCounter[conip]
    for ts in d.keys():
        mlcol.find_one_and_update(
            {"connip": conip}, 
            {"$inc":{ts:d[ts]}}, 
            upsert=True
        )

这样，处理过程要快得多，但是我看到要单独更新每个计数器仍然需要很长时间。

是否可以在单个命令中启动多个更新？

还有其他想法可以使此过程更快吗？

Answer 1

如Alex Blex所述，创建索引和批量执行解决了该问题：

mlcol.create_index("connip")


bulk=mlcol.initialize_unordered_bulk_op()
for conip in conCounter.keys():
    d = conCounter[conip]
    for ts in d.keys():
        bulk.find({"connip": conip}).upsert().update({"$inc":{ts:d[ts]}})
res=bulk.execute()

使用pymongo在单个命令中更新多个计数器

1 个答案: