Question

我想使用update（.. $ push ..）操作扩展一个大型数组。

以下是详细信息：

我有一个包含许多字段的大型集合'A'。在这些字段中，我想提取'F'字段的值，并将它们转换为存储在集合'B'中文档的单个字段内的一个大数组。

我将流程分为几个步骤（以限制使用的内存）

这是python程序：

...
steps = 1000  # number of steps
step = 10000  # each step will handle this number of documents
start = 0

for j in range(steps):
    print('step:', j, 'start:', start)

    project = {'$project': {'_id':0, 'F':1} }
    skip = {'$skip': start}
    limit = {'$limit': step}
    cursor = A.aggregate( [ skip, limit, project ], allowDiskUse=True )

    a = []
    for i, o in enumerate(cursor):
        value = o['F']
        a.append(value)

    print('len:', len(a))
    B.update( {'_id': 1}, { '$push': {'v' : { '$each': a } } } )

    start += step

这是该计划的弊端：

step: 0 start: 0
step: 1 start: 100000
step: 2 start: 200000
step: 3 start: 300000
step: 4 start: 400000
step: 5 start: 500000
step: 6 start: 600000
step: 7 start: 700000
step: 8 start: 800000
step: 9 start: 900000
step: 10 start: 1000000
Traceback (most recent call last):
  File "u_psfFlux.py", line 109, in <module>
    lsst[k].update( {'_id': 1}, { '$push': {'v' : { '$each': a } } } )
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pymongo/collection.py", line 2503, in update
    collation=collation)
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pymongo/collection.py", line 754, in _update
    _check_write_command_response([(0, result)])
  File "/home/ubuntu/.local/lib/python3.5/site-packages/pymongo/helpers.py", line 315, in _check_write_command_response
    raise WriteError(error.get("errmsg"), error.get("code"), error)
pymongo.errors.WriteError: Resulting document after update is larger than 16777216

显然$ push操作必须获取完整的数组！（我的期望是这个操作总是需要相同数量的内存，因为我们总是将相同数量的值附加到数组）

简而言之，我不明白为什么更新/ $ push操作失败并出现错误...

或者......有没有办法避免这种不必要的缓冲？

感谢您的建议

基督教

mongo：更新$ push失败，“更新后的结果文件大于16777216”

0 个答案: