我正在尝试使用python库pymongo
批量插入MongoDB中的文档。
import pymongo
def tryManyInsert():
p = {'x' : 1, 'y' : True, 'z': None}
mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
mongoColl.insert_many([p for i in range(10)])
tryManyInsert()
但由于BulkWriteError
,我一直在失败。
Traceback (most recent call last):
File "/prog_path/testMongoCon.py", line 9, in <module>
tryManyInsert();
File "/prog_path/testMongoCon.py", line 7, in tryManyInsert
mongoColl.insert_many([p for i in range(10)])
File "/myenv_path/lib/python3.6/site-packages/pymongo/collection.py", line 724, in insert_many
blk.execute(self.write_concern.document)
File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 493, in execute
return self.execute_command(sock_info, generator, write_concern)
File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 331, in execute_command
raise BulkWriteError(full_result)
pymongo.errors.BulkWriteError: batch op errors occurred
我尝试在没有_id
的情况下按顺序插入10个文档,因此此answer / discussion中的条件不适用于此处。类似的question没有答案。
我尝试了pymongo 3.4
和pymongo 3.5.1
,两者都给出了同样的错误。我在python3.6
,mongodb 3.2.10
。
我在这里做错了什么?
答案 0 :(得分:1)
Python仍然将p
称为每个数组成员的相同内容。您希望每个阵列成员copy()
p
:
import pymongo
from copy import copy
def tryManyInsert():
p = {'x' : 1, 'y' : True, 'z': None}
mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
mongoColl.insert_many([copy(p) for i in range(10)])
tryManyInsert()
甚至简单地说:
mongoColl.insert_many([{ 'x': 1, 'y': True, 'z': None } for i in range(10)])
除非您这样做,_id
只会被分配一次,而您只是在{{1}的参数中重复“同一文档”并使用相同的_id
}。因此,重复键的错误。
快速演示:
insert_many()
给你:
from bson import ObjectId
p = { 'a': 1 }
def addId(obj):
obj['_id'] = ObjectId()
return obj
docs = map(addId,[p for i in range(2)])
print docs
或者更简洁:
[
{'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')},
{'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')}
]
给出:
p = { 'a': 1 }
def addKey(x):
x[0]['b'] = x[1]
return x[0]
docs = map(addKey,[[p,i] for i,p in enumerate([p for i in range(3)])])
print docs
这清楚地表明通过覆盖传入的相同值的索引值。
但是使用copy()
获取值的副本:
[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}, {'a': 1, 'b': 2}]
给你:
from bson import ObjectId
p = { 'a': 1 }
def addId(obj):
obj['_id'] = ObjectId()
return obj
docs = map(addId,[copy(p) for i in range(2)])
print docs
或我们的基础演示:
[
{'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc00')},
{'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc01')}
]
返回:
p = { 'a': 1 }
def addKey(x):
x[0]['b'] = x[1]
return x[0]
docs = map(addKey,[[p,i] for i,p in enumerate([copy(p) for i in range(3)])])
print docs
所以这基本上就是python的工作原理。如果你实际上没有故意分配一个新值,那么你所做的只是返回相同的引用值,只是更新循环中的每个引用值,而不是生成一个“新的”。