我正在尝试使用pymongo阅读从mongodb到csv文件的1M文档。我的代码如下:
import csv
from pymongo import MongoClient
from datetime import datetime
from bson import json_util
from tempfile import NamedTemporaryFile
client = MongoClient('mongodb://login:pass@server:port')
db = client.some_mongo_database
collection = db.some_mongo_collection
fromDate = datetime.strptime("2018-05-15 21:00", '%Y-%m-%d %H:%M')
tillDate = datetime.strptime("2018-05-16 21:00", '%Y-%m-%d %H:%M')
query = {
"$or": [
{"LastUpdated": {"$gte": fromDate
, "$lt": tillDate}
},
{"$and": [
{"Created": {"$gte": fromDate
, "$lt": tillDate}
},
{"LastUpdated": None}
]
}
]
}
cursor = collection.find(query, no_cursor_timeout=True)
之后如果我这样做:
for row in cursor:
print(row)
cursor.close()
一切正常,我可以获得所有文件。 但如果我这样做的话:
with NamedTemporaryFile("w", delete=False) as temp:
csv_writer = csv.writer(temp, delimiter='\t', quotechar='\b', quoting=csv.QUOTE_MINIMAL)
for row in cursor:
csv_row = [ [[row['_id']], str(json.dumps(row,default=json_util.default))] ]
csv_writer.writerows(csv_row)
cursor.close()
大约2分钟后,我收到了20万份文件:
Traceback (most recent call last):
File "mongo_data_loader.py", line 25, in <module>
for row in cursor:
File "/Library/Python/2.7/site-packages/pymongo/cursor.py", line 1169, in next
if len(self.__data) or self._refresh():
File "/Library/Python/2.7/site-packages/pymongo/cursor.py", line 1106, in _refresh
self.__send_message(g)
File "/Library/Python/2.7/site-packages/pymongo/cursor.py", line 975, in __send_message
helpers._check_command_response(first)
File "/Library/Python/2.7/site-packages/pymongo/helpers.py", line 142, in _check_command_response
raise CursorNotFound(errmsg, code, response)
pymongo.errors.CursorNotFound: cursor id 184972541202 not found
我做错了什么?
Python 2.7.10
pymongo 3.6.1
mongo db.version() 3.6.5
答案 0 :(得分:1)
作为临时解决方案,我做了:
processed = 0
while True:
cursor = collection.find(query, no_cursor_timeout=True).skip(processed)
try:
for row in cursor:
csv_row = [ [[row['_id']], str(json.dumps(row,default=json_util.default))] ]
csv_writer.writerows(csv_row)
processed += 1
cursor.close()
break
except CursorNotFound:
print("Lost cursor. Retry with skip")
但上述行为的问题仍然是实际的