Question

我正在尝试使用bulk_write指令插入大量文档（+ 1M）。为此，我创建了一个InsertOne函数列表。

python version = 3.7.4

pymongo version = 3.8.0

文档创建：

document = {
    'dictionary': ObjectId(dictionary_id),
    'price': price,
    'source': source,
    'promo': promo,
    'date': now_utc,
    'updatedAt': now_utc,
    'createdAt:': now_utc
  }

# add line to debug
if '_id' in document.keys():
    print(document)

return document

我通过从元素列表中添加新字段来创建文档的完整列表，并使用InsertOne创建查询

bulk = []
for element in list_elements:
    for document in documents:
        document['new_field'] = element
        # add line to debug
        if '_id' in document.keys():
           print(document)
        insert = InsertOne(document)
        bulk.append(insert)
return bulk

我使用bulk_write命令进行插入

collection.bulk_write(bulk, ordered=False)

我附上了文档https://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.bulk_write

根据文档，_id字段会自动添加 Parameter - document: The document to insert. If the document is missing an _id field one will be added.

某种程度上似乎做错了，因为其中一些具有相同的值。收到700万个1M文档的错误（当然有_id） 'E11000 duplicate key error collection: database.collection index: _id_ dup key: { _id: ObjectId(\'5f5fccb4b6f2a4ede9f6df62\') }' pymongo对我来说似乎是个错误，因为我在很多情况下都使用了这种方法，但是我没有这么大的文档

_id字段必须是唯一的，但是由于pymongo是自动完成的，因此我不知道如何解决此问题，也许使用UpdateOne并使用Trues ups来解决这个问题过滤并希望获得最好的结果。

对于解决此问题的任何解决方案或解决方案，我将不胜感激

Answer 1

如果您的代码段中的任何documents已包含一个_id，则不会添加一个新的fetch('/users')，并且冒着出现重复错误的风险。 / p>

Answer 2

似乎在添加文档的新字段并将其追加到列表中时，我创建了相同元素的相似实例，因此我有len(list_elements)次相同的查询，所以这就是为什么重复的密钥错误。

为解决该问题，我将文档的副本添加到列表中

bulk.append(document.copy())

然后使用该列表创建查询

我要感谢@Belly Buster在此问题上的帮助

解决E11000重复键错误集合：pymongo中的_id_ dup键

2 个答案: