Question

我在一个文本文件中逐行记录了2.71亿条记录，需要将其添加到MongoDB中，并且我正在使用Python和Pymongo来做到这一点。

我首先将包含2.71亿条记录的单个文件拆分为包含每100万行的多个文件，并编写了当前代码以将其添加到数据库中：

import os
import threading
from pymongo import MongoClient


class UserDb:
    def __init__(self):
        self.client = MongoClient('localhost', 27017)
        self.data = self.client.large_data.data


threadlist = []

def start_add(listname):
    db = UserDb()
    with open(listname, "r") as r:
        for line in r:
            if line is None:
                return
            a = dict()
            a['no'] = line.strip()
            db.data.insert_one(a)
    print(listname, "Done!")


for x in os.listdir(os.getcwd()):
    if x.startswith('li'):
        t = threading.Thread(target=start_add, args=(x,))
        t.start()
        threadlist.append(t)

print("All threads started!")


for thread in threadlist:
    thread.join()

这将启动与文件一样多的线程，并将每行添加到数据库中。不好的是，三个小时后，它只增加了8.630.623。

我该怎么做才能更快地添加记录？

一行数据只有8位数字：（例如12345678）

Answer 1

而不是执行多个insert_one，而是看看bulk write operators。

稍微玩一下，看看哪种方法最有效；我发现有10,000个工作批次可供我使用，但这取决于硬件。

向MongoDB插入2.71亿条记录

1 个答案: