由于BulkWriteError,使用pymongo的批量插入映射数组失败

时间:2017-11-03 00:26:33

标签: python mongodb python-3.x pymongo

我正在尝试使用python库pymongo批量插入MongoDB中的文档。

import pymongo
def tryManyInsert():
    p = {'x' : 1, 'y' : True, 'z': None}
    mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
    mongoColl.insert_many([p for i in range(10)])
tryManyInsert()

但由于BulkWriteError,我一直在失败。

Traceback (most recent call last):
    File "/prog_path/testMongoCon.py", line 9, in <module>
    tryManyInsert();
    File "/prog_path/testMongoCon.py", line 7, in tryManyInsert
mongoColl.insert_many([p for i in range(10)])
    File "/myenv_path/lib/python3.6/site-packages/pymongo/collection.py", line 724, in insert_many
blk.execute(self.write_concern.document)
    File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 493, in execute
return self.execute_command(sock_info, generator, write_concern)
    File "/myenv_path/lib/python3.6/site-packages/pymongo/bulk.py", line 331, in execute_command
raise BulkWriteError(full_result)
    pymongo.errors.BulkWriteError: batch op errors occurred

我尝试在没有_id的情况下按顺序插入10个文档,因此此answer / discussion中的条件不适用于此处。类似的question没有答案。

我尝试了pymongo 3.4pymongo 3.5.1,两者都给出了同样的错误。我在python3.6mongodb 3.2.10。 我在这里做错了什么?

1 个答案:

答案 0 :(得分:1)

Python仍然将p称为每个数组成员的相同内容。您希望每个阵列成员copy() p

import pymongo
from copy import copy
def tryManyInsert():
    p = {'x' : 1, 'y' : True, 'z': None}
    mongoColl = pymongo.MongoClient('localhost', 27017)['test']['multiIn']
    mongoColl.insert_many([copy(p) for i in range(10)])
tryManyInsert()

甚至简单地说:

    mongoColl.insert_many([{ 'x': 1, 'y': True, 'z': None } for i in range(10)])

除非您这样做,_id只会被分配一次,而您只是在{{1}的参数中重复“同一文档”并使用相同的_id }。因此,重复键的错误。

快速演示:

insert_many()

给你:

from bson import ObjectId

p = { 'a': 1 }

def addId(obj):
  obj['_id'] = ObjectId()
  return obj

docs = map(addId,[p for i in range(2)])
print docs

或者更简洁:

[
  {'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')}, 
  {'a': 1, '_id': ObjectId('59fbc4a16cb6b30bdb3de0fd')}
]

给出:

p = { 'a': 1 }

def addKey(x):
  x[0]['b'] = x[1]
  return x[0]

docs = map(addKey,[[p,i] for i,p in enumerate([p for i in range(3)])])
print docs

这清楚地表明通过覆盖传入的相同值的索引值。

但是使用copy()获取值的副本:

[{'a': 1, 'b': 2}, {'a': 1, 'b': 2}, {'a': 1, 'b': 2}]

给你:

from bson import ObjectId

p = { 'a': 1 }

def addId(obj):
  obj['_id'] = ObjectId()
  return obj

docs = map(addId,[copy(p) for i in range(2)])
print docs

或我们的基础演示:

[
  {'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc00')},
  {'a': 1, '_id': ObjectId('59fbc5466cb6b30be4d0fc01')}
]

返回:

p = { 'a': 1 }

def addKey(x):
  x[0]['b'] = x[1]
  return x[0]

docs = map(addKey,[[p,i] for i,p in enumerate([copy(p) for i in range(3)])])
print docs

所以这基本上就是python的工作原理。如果你实际上没有故意分配一个新值,那么你所做的只是返回相同的引用值,只是更新循环中的每个引用值,而不是生成一个“新的”。