Question

行for doc in collection.find({'is_timeline_valid': True}):给出了消息长度错误。如何在没有错误的情况下获得所有收藏？我知道find().limit()，但我不知道如何使用它。

代码：

from openpyxl import load_workbook
import pymongo
import os

wb = load_workbook('concilia.xlsx')
ws = wb.active
client = pymongo.MongoClient('...')
db = client['...']
collection = db['...']

r = 2
for doc in collection.find({'is_timeline_valid': True}):
   for dic in doc['timeline']['datas']:
     if 'concilia' in dic['tramite'].lower():
        ws.cell(row = r, column = 1).value = doc['id_process_unformatted']
        ws.cell(row = r, column = 2).value = dic['data']
        ws.cell(row = r, column = 3).value = dic['tramite']
        wb.save('concilia.xlsx')
        print('*****************************')
        print(dic['tramite'])
        # print('check!')
        r += 1

Answer 1

这是一个简单的分页器，它将查询执行拆分为分页查询。

from itertools import count

class PaginatedCursor(object):
    def __init__(self, cur, limit=100):
        self.cur = cur
        self.limit = limit
        self.count = cur.count()

    def __iter__(self):
        skipper = count(start=0, step=self.limit)

        for skip in skipper:
            if skip >= self.count:
                break

            for document in self.cur.skip(skip).limit(self.limit):
                yield document

            self.cur.rewind()

...
cur = collection.find({'is_timeline_valid': True})
...
for doc in PaginatedCursor(cur, limit=100):
   ...

Answer 2

我今天遇到了这个问题，事实证明，这与集合中特定文档的大小超过max_bson_size限制有关。将文档添加到集合中时，请确保文档大小不超过max_bson_size大小。

document_size_limit = client.max_bson_size
assert len(json.dumps(data)) < document_size_limit

我目前正在调查为什么馆藏首先允许大于max_bson_size的文档。

Answer 3

我们可以将batch_size添加到find（）来减小消息的大小。

for doc in collection.find({'is_timeline_valid': True}):

成为

for doc in collection.find({'is_timeline_valid': True}, batch_size=1):

pymongo - 消息长度大于服务器最大消息大小

3 个答案: