示例代码（我使用`pymongo`，因此以下是Python语法）

Question

我在patients colletion中有数十亿条记录，我不知道如何用管道过滤它。

或者这是对mongoDB的限制，我们无法在大型集合中与管道聚合？

我已添加allowDiskUse=True选项，但它也不起作用。

如何通过管道获取过滤结果？

我怎样才能将过滤后的结果存储到另一个集合中？谢谢

示例代码（我使用`pymongo`，因此以下是Python语法）

import datetime
pipeline = [
        {"$project": {"birthday":1, "id":1}
    },
    {
             "$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)} }    
     },{"$group": ~~~
     }
    ]
res =db.patients.aggregate(pipeline,allowDiskUse=True)

异常消息

OperationFailure: command SON([('aggregate', u'patients'), ('pipeline', [{'$match': {'birthday': {'$gte': datetime.datetime(1987, 1, 1, 0, 0)}}}]), ('allowDiskUse', True)]) on namespace tw_insurance_security_development.$cmd failed: exception: aggregation result exceeds maximum document size (16MB)

如何

Answer 1

将结果存储在collection中，然后对其进行另一次聚合传递。这是mongo的限制，您可以阅读更多相关信息here。

Answer 2

如果这是针对具有基于人类的用户界面的应用程序，那么我建议使用分页 - 通过mongo的跳过和限制，并将字段限制为可由人类查看的字段（您似乎在使用$项目）。

“我不知道如何用管道过滤它。” 请尝试以下

i_Limit = 100 # Or whatever value plays nice
cnt_Skip = 0
has_Next = True

while has_Next:

         pipe = [{"$project": {"birthday":1, "id":1}},{"$match": { "birthday":{"$gte":datetime.datetime(1987, 1, 1, 0, 0)}}}, {$skip: cnt_Skip},{$limit: i_Limit}, {"$group": ~~~ }]   

        cursor =db.patients.aggregate(pipe,allowDiskUse=True)

        if not cursor:
            has_Next = False
            continue

        for record in cursor:

            # Do whatever needed with the record
            print record



        cnt_Skip = cnt_Skip + i_Limit

如果进行大转储，请使用mongoexport。

Pymongo无法在太大的集合中执行与管道的聚合

示例代码（我使用`pymongo`，因此以下是Python语法）

异常消息

2 个答案:

Pymongo无法在太大的集合中执行与管道的聚合

示例代码（我使用pymongo，因此以下是Python语法）

异常消息

2 个答案:

示例代码（我使用`pymongo`，因此以下是Python语法）