如何为大型(200,000+记录)集合运行pymongo聚合查询?

时间:2018-01-13 05:51:38

标签: python mongodb pymongo data-analysis

我需要为拥有200,000多条数据记录的大型集合运行聚合查询。我想用pymongo运行它。我在文档中尝试了首选方法。

  

管道= [...]

     

db.command('aggregate','statCollection',pipeline = pipeline_aggregate)

但是这返回了一个错误pymongo.errors.OperationFailure: The 'cursor' option is required, except for aggregate with the explain argument

2 个答案:

答案 0 :(得分:0)

来自aggregation

的mongo文档
  

版本3.4中更改:MongoDB 3.6删除了聚合的使用   除非命令包含,否则不带游标选项的命令   解释选项。除非您包含说明选项,否则必须   指定光标选项。

您可以将runCommand更改为aggregate pipeline或在命令中提供cursor

没有cursor 的runCommand - 错误

> db.runCommand({aggregate : "coin_infos", pipeline : [ {$match : {"coin_code" : "BTC"}}]})
{
    "ok" : 0,
    "errmsg" : "The 'cursor' option is required, except for aggregate with the explain argument",
    "code" : 9,
    "codeName" : "FailedToParse"
}

cursor - 返回光标

> db.runCommand({aggregate : "coin_infos", pipeline : [ {$match : {"coin_code" : "BTC"}}], cursor : {}})
{
    "cursor" : {
        "firstBatch" : [
            {
                "_id" : ObjectId("5a4b07b2a0050c20a6be44b3"),
                "coin_code" : "BTC",
                "wallet_name" : "bitcoin",
                "deposite_txn_fee" : 3,
                "min_withdrawn" : 5,
                "withdrawn_txn_fee" : 0.001
            }
        ],
        "id" : NumberLong(0),
        "ns" : "bitcoin.coin_infos"
    },
    "ok" : 1
}

explain:trueno cursor - 返回执行计划

> db.runCommand({aggregate : "coin_infos", pipeline : [ {$match : {"coin_code" : "BTC"}}], explain : true})
{
    "stages" : [
        {
            "$cursor" : {
                "query" : {
                    "coin_code" : "BTC"
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "bitcoin.coin_infos",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "coin_code" : {
                            "$eq" : "BTC"
                        }
                    },
                    "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                            "coin_code" : {
                                "$eq" : "BTC"
                            }
                        },
                        "direction" : "forward"
                    },
                    "rejectedPlans" : [ ]
                }
            }
        }
    ],
    "ok" : 1
}

答案 1 :(得分:0)

我使用allowDiskUse选项解决了问题。所以这是我的答案。

  

pipeline_2 = [...]

     

db.command('aggregate','statCollection',pipeline = pipeline_2,allowDiskUse = True,cursor = {})