使用Pymongo的并行扫描时找不到光标

时间:2014-07-28 14:23:25

标签: mongodb cursor timeout pymongo

我有一个mongo数据库,其中包含我用pymongo处理的3.000.000个文档的集合。我想迭代一遍所有文档而不更新集合。 我尝试使用四个线程来做到这一点:

cursors = db[collection].parallel_scan(CURSORS_NUM)
threads = [
    threading.Thread(target=process_cursor, args=(cursor, )) for cursor in cursors
]

for thread in threads:
    thread.start()

for thread in threads:
    thread.join()

过程光标功能:

def process_cursor(cursor):
    for document in cursor:
        dosomething(document)

经过一段时间处理文档后,我收到错误:

  File "extendDocuments.py", line 133, in process_cursor
    for document in cursor:
  File "/usr/local/lib/python2.7/dist-packages/pymongo/command_cursor.py", line 165, in next
    if len(self.__data) or self._refresh():
  File "/usr/local/lib/python2.7/dist-packages/pymongo/command_cursor.py", line 142, in _refresh
    self.__batch_size, self.__id))
  File "/usr/local/lib/python2.7/dist-packages/pymongo/command_cursor.py", line 110, in __send_message
    *self.__decode_opts)
  File "/usr/local/lib/python2.7/dist-packages/pymongo/helpers.py", line 97, in _unpack_response
    cursor_id)
CursorNotFound: cursor id '116893918402' not valid at server

如果我使用find(),我可以将超时设置为false以避免这种情况。 我可以使用并行扫描得到的游标做类似的事情吗?

1 个答案:

答案 0 :(得分:1)

目前还没有办法关闭parallelCollectionScan返回的游标的空闲超时。我已经打开了一项功能请求:

https://jira.mongodb.org/browse/SERVER-15042