Question

我正在练习如何使用 MongoDB聚合，但是它们似乎需要很长时间（运行时间）。

每当我使用$group时，都会出现问题。所有其他查询都可以正常运行。

我有一些 1.3个虚拟文档，它们需要执行两项基本操作：获得IP地址的计数和唯一 IP地址。

我的模式看起来像这样：

{
    "_id":"5da51af103eb566faee6b8b4",
    "ip_address":"...",
    "country":"CL",
    "browser":{
        "user_agent":...",
    }
}

运行基本的$group查询平均大约需要12秒，这太慢了。

我做了一些研究，有人建议在ip_addresses上创建一个索引。这似乎已经减慢了速度，因为查询现在使用13-15 s。

我使用MongoDB，正在运行的查询如下：

    visitorsModel.aggregate([
        {
            '$group': {
                '_id': '$ip_address',
                'count': {
                    '$sum': 1
                }
            }
        }
    ]).allowDiskUse(true)
        .exec(function (err, docs) {
            if (err) throw err;

            return res.send({
                uniqueCount: docs.length
            })
        })

感谢您的帮助。

编辑：我忘了提，有人建议这可能是硬件问题？如果有帮助，我将在核心i5、8GB RAM笔记本电脑上运行查询。

编辑2 ：查询计划：

{
    "stages" : [
        {
            "$cursor" : {
                "query" : {

                },
                "fields" : {
                    "ip_address" : 1,
                    "_id" : 0
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "metrics.visitors",
                    "indexFilterSet" : false,
                    "parsedQuery" : {

                    },
                    "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "direction" : "forward"
                    },
                    "rejectedPlans" : [ ]
                },
                "executionStats" : {
                    "executionSuccess" : true,
                    "nReturned" : 1387324,
                    "executionTimeMillis" : 7671,
                    "totalKeysExamined" : 0,
                    "totalDocsExamined" : 1387324,
                    "executionStages" : {
                        "stage" : "COLLSCAN",
                        "nReturned" : 1387324,
                        "executionTimeMillisEstimate" : 9,
                        "works" : 1387326,
                        "advanced" : 1387324,
                        "needTime" : 1,
                        "needYield" : 0,
                        "saveState" : 10930,
                        "restoreState" : 10930,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "direction" : "forward",
                        "docsExamined" : 1387324
                    }
                }
            }
        },
        {
            "$group" : {
                "_id" : "$ip_address",
                "count" : {
                    "$sum" : {
                        "$const" : 1
                    }
                }
            }
        }
    ],
    "ok" : 1
}

Answer 1

您可以创建索引

db.collectionname.createIndex( { ip_address: "text" } )

尝试一下，它更快。我认为这会对您有所帮助。

Answer 2

这是有关使用$group聚合阶段（如果它使用索引）的一些信息，其局限性以及可以克服这些局限的方法。

1。 $ group阶段不使用索引： Mongodb Aggregation: Does $group use index?

2。 $ group运算符和内存：

$group阶段的RAM限制为100 MB。默认情况下，如果如果阶段超出此限制，则$group返回错误。考虑到要处理大型数据集，请将allowDiskUse选项设置为true。此标志使$ group操作可以写入临时文件。

请参见MongoDb docs on $group Operator and Memory

3。使用$ group和Count的示例：

名为cities的集合：

{ "_id" : 1, "city" : "Bangalore", "country" : "India" }
{ "_id" : 2, "city" : "New York", "country" : "United States" }
{ "_id" : 3, "city" : "Canberra", "country" : "Australia" }
{ "_id" : 4, "city" : "Hyderabad", "country" : "India" }
{ "_id" : 5, "city" : "Chicago", "country" : "United States" }
{ "_id" : 6, "city" : "Amritsar", "country" : "India" }
{ "_id" : 7, "city" : "Ankara", "country" : "Turkey" }
{ "_id" : 8, "city" : "Sydney", "country" : "Australia" }
{ "_id" : 9, "city" : "Srinagar", "country" : "India" }
{ "_id" : 10, "city" : "San Francisco", "country" : "United States" }

查询集合以按每个国家/地区对城市进行计数：

db.cities.aggregate( [
    { $group: { _id: "$country", cityCount: { $sum: 1 } } },
    { $project: { country: "$_id", _id: 0, cityCount: 1 } }
] )

结果：

{ "cityCount" : 3, "country" : "United States" }
{ "cityCount" : 1, "country" : "Turkey" }
{ "cityCount" : 2, "country" : "Australia" }
{ "cityCount" : 4, "country" : "India" }

4。使用allowDiskUse选项：

db.cities.aggregate( [
    { $group: { _id: "$country", cityCount: { $sum: 1 } } },
    { $project: { country: "$_id", _id: 0, cityCount: 1 } }
],  { allowDiskUse : true } )

注意，在这种情况下，它对查询性能或输出没有影响。这只是显示用法。

5。可以尝试的一些选项（建议）：

您可以尝试一些方法以得到一些结果（仅用于试用）：

使用$limit阶段并限制处理的文档数量和看看结果如何。例如，您可以尝试{ $limit: 1000 }。请注意，此阶段需要在$group阶段之前进行。
您还可以在$match之前使用$project，$group阶段阶段来控制输入的 shape 和 size 。这可能返回结果（而不是错误）。

[编辑添加]

与众不同的注释：

使用相同的cities集合-要获得唯一的国家/地区和国家/地区的数量，您可以尝试使用聚合阶段$count和$group，如以下两个查询中所示。

不同：

db.cities.aggregate( [
   { $match: { country: { $exists: true } } },
   { $group: { _id: "$country" } },
   { $project: { country: "$_id", _id: 0 } }
] )

结果：

{ "country" : "United States" }
{ "country" : "Turkey" }
{ "country" : "India" }
{ "country" : "Australia" }

要将上述结果作为具有唯一值数组的单个文档来获得，请使用$addToSet运算符：

db.cities.aggregate( [
   { $match: { country: { $exists: true } } },
   { $group: { _id: null, uniqueCountries: { $addToSet:  "$country" } } },
   { $project: { _id: 0 } },
] )

结果：{ "uniqueCountries" : [ "United States", "Turkey", "India", "Australia" ] }

计数：

db.cities.aggregate( [
   { $match: { country: { $exists: true } } },
   { $group: { _id: "$country" } },
   { $project: { country: "$_id", _id: 0 } },
   { $count: "uniqueCountryCount" }
] )

结果：{ "uniqueCountryCount" : 4 }

在上述查询中，$match阶段用于过滤任何不存在或为空的country字段的文档。 $project阶段可调整结果文档的形状。

MongoDB查询语言：

请注意，在使用 MongoDB查询语言命令时，两个查询会得到相似的结果：db.collection.distinct("country")和db.cities.distinct("country").length（请注意distinct返回一个数组）。 / p>

Mongodb $ group合计需要很长时间

2 个答案: