Question

我在SQL Server表和MongoDB集合中保存了大约3000万条记录的相同数据。示例记录如下所示，我也设置了相同的索引。下面是返回相同数据的查询，一个是SQL，另一个是mongo。 SQL查询需要2秒才能计算并返回，另一方面mongo需要50个。任何想法为什么mongo比SQL慢得多？

SQL

SELECT 
    COUNT(DISTINCT IP) AS Count,
    DATEPART(dy, datetime)
FROM 
    collection
GROUP BY 
    DATEPART(dy, datetime)

MONGO

db.collection.aggregate([{$group:{ "_id": { $dayOfYear:"$datetime" }, IP: { $addToSet: "$IP"} }},{$unwind:"$IP"},{$group:{ _id: "$_id", count: { $sum:1} }}])

示例文档，两者中有大约3000万个完全相同的数据

{
  "_id" : ObjectId("57968ebc7391bb1f7c2f4801"),
  "IP" : "127.0.0.1",
  "userAgent" : "Mozilla/5.0+(Windows+NT+10.0;+WOW64;+Trident/7.0;+LCTE;+rv:11.0)+like+Gecko",
  "Country" : null,
  "datetime" : ISODate("2016-07-25T16:50:18-05:00"),
  "proxy" : null,
  "url" : "/records/archives/archivesdb/deathcertificates/",
  "HTTPStatus" : "302",
  "HTTPResponseTime" : "218"
}

编辑：添加了两个查询的解释

MONGO

{
    "waitedMS" : NumberLong(0),
    "stages" : [
        {
            "$cursor" : {
                "query" : {

                },
                "fields" : {
                    "IP" : 1,
                    "datetime" : 1,
                    "_id" : 0
                },
                "queryPlanner" : {
                    "plannerVersion" : 1,
                    "namespace" : "IISLogs.pubprdweb01",
                    "indexFilterSet" : false,
                    "parsedQuery" : {
                        "$and" : [ ]
                    },
                    "winningPlan" : {
                        "stage" : "COLLSCAN",
                        "filter" : {
                            "$and" : [ ]
                        },
                        "direction" : "forward"
                    },
                    "rejectedPlans" : [ ]
                }
            }
        },
        {
            "$group" : {
                "_id" : {
                    "$dayOfYear" : [
                        "$datetime"
                    ]
                },
                "IP" : {
                    "$addToSet" : "$IP"
                }
            }
        },
        {
            "$unwind" : {
                "path" : "$IP"
            }
        },
        {
            "$group" : {
                "_id" : "$_id",
                "count" : {
                    "$sum" : {
                        "$const" : 1
                    }
                }
            }
        }
    ],
    "ok" : 1
}

SQL Server我没有权限，因为我不是DBA或其他任何东西，但它工作得足够快，以至于我不太关心它的执行计划，麻烦的事情是我是mongo正在使用FETCH

Answer 1

MongoDB版本很慢，因为$group can't use an index（由查询计划中的"COLLSCAN"证明），因此必须将所有3000万个文档读入内存并通过管道运行

这种类型的实时查询（计算所有文档的摘要数据）根本不适合MongoDB。最好定期运行aggregate $out阶段（或使用map-reduce）从主集合生成摘要数据，然后查询生成的摘要集合。

MongoDB比SQL Server慢

1 个答案: