我有一个脚本可以将数据从一个集合复制到另一个集合。 但有时脚本会因以下错误而停止:
Traceback (most recent call last):
File "request_archive.py", line 80, in <module>
ret = archive_requests(dbType)
File "request_archive.py", line 41, in archive_requests
for doc in reportCursor :
File "/usr/local/lib64/python2.7/site-packages/pymongo/cursor.py", line 1176, in next
if len(self.__data) or self._refresh():
File "/usr/local/lib64/python2.7/site-packages/pymongo/cursor.py", line 1087, in _refresh
self.__send_message(q)
File "/usr/local/lib64/python2.7/site-packages/pymongo/cursor.py", line 970, in __send_message
codec_options=self.__codec_options)
File "/usr/local/lib64/python2.7/site-packages/pymongo/cursor.py", line 1057, in _unpack_response
return response.unpack_response(cursor_id, codec_options)
File "/usr/local/lib64/python2.7/site-packages/pymongo/message.py", line 945, in unpack_response
return bson.decode_all(self.documents, codec_options)
bson.errors.InvalidBSON: 'utf8' codec can't decode byte 0xc0 in position 2: invalid start byte
当我试图找到导致此问题的文档时,通常是这样的,在mongo中:
{
"_id" : ObjectId("38636f733444373635323637"),
"mobiles" : "..��..��..��..��..��..��..��..��..��..��etc/passwd",
"requestDate" : ISODate("2018-03-19T09:32:45.000Z"),
"isCopied" : NumberLong(0)
}
如何处理?
我也无法将它放入try-catch中,因为错误是在线上引起的,同时迭代游标。 我在SO上找到了一些答案,但是没有工作。
我正在使用python-2.7和pymongo v3.6.0。
修改1:
这就是我复制数据的方式:
findData = {'isCopied' : 0 , 'requestDate' : { '$lte' : today } }
collection1Cursor = collection1.find(findData)
for doc in collection1Cursor : # getting error in this line
updateArr.append(doc['_id'])
doc.pop('isCopied', None)
dataArr.append(doc)
collection2.insert(dataArr,continue_on_error=True)
collection1.update_many({'_id' : {'$in' : updateArr}},{'$set' : {'isCopied' : 1}})