我想使用update(.. $ push ..)操作扩展一个大型数组。
以下是详细信息:
我有一个包含许多字段的大型集合'A'。在这些字段中,我想提取'F'字段的值,并将它们转换为存储在集合'B'中文档的单个字段内的一个大数组。
我将流程分为几个步骤(以限制使用的内存)
这是python程序:
...
steps = 1000 # number of steps
step = 10000 # each step will handle this number of documents
start = 0
for j in range(steps):
print('step:', j, 'start:', start)
project = {'$project': {'_id':0, 'F':1} }
skip = {'$skip': start}
limit = {'$limit': step}
cursor = A.aggregate( [ skip, limit, project ], allowDiskUse=True )
a = []
for i, o in enumerate(cursor):
value = o['F']
a.append(value)
print('len:', len(a))
B.update( {'_id': 1}, { '$push': {'v' : { '$each': a } } } )
start += step
这是该计划的弊端:
step: 0 start: 0
step: 1 start: 100000
step: 2 start: 200000
step: 3 start: 300000
step: 4 start: 400000
step: 5 start: 500000
step: 6 start: 600000
step: 7 start: 700000
step: 8 start: 800000
step: 9 start: 900000
step: 10 start: 1000000
Traceback (most recent call last):
File "u_psfFlux.py", line 109, in <module>
lsst[k].update( {'_id': 1}, { '$push': {'v' : { '$each': a } } } )
File "/home/ubuntu/.local/lib/python3.5/site-packages/pymongo/collection.py", line 2503, in update
collation=collation)
File "/home/ubuntu/.local/lib/python3.5/site-packages/pymongo/collection.py", line 754, in _update
_check_write_command_response([(0, result)])
File "/home/ubuntu/.local/lib/python3.5/site-packages/pymongo/helpers.py", line 315, in _check_write_command_response
raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: Resulting document after update is larger than 16777216
显然$ push操作必须获取完整的数组! (我的期望是这个操作总是需要相同数量的内存,因为我们总是将相同数量的值附加到数组)
简而言之,我不明白为什么更新/ $ push操作失败并出现错误...
或者......有没有办法避免这种不必要的缓冲?
感谢您的建议
基督教