Question

我有mongodb集合，其结构如下： -

{
"_id" : "mongo",
"log" : [
    {
        "ts" : ISODate("2011-02-10T01:20:49Z"),
        "visitorId" : "25850661"
    },
    {
        "ts" : ISODate("2014-11-01T14:35:05Z"),
        "visitorId" : NumberLong(278571823)
    },
    {
        "ts" : ISODate("2014-11-01T14:37:56Z"),
        "visitorId" : NumberLong(0)
    },
    {
        "ts" : ISODate("2014-11-04T06:23:48Z"),
        "visitorId" : NumberLong(225200092)
    },
    {
        "ts" : ISODate("2014-11-04T06:25:44Z"),
        "visitorId" : NumberLong(225200092)
    }
],
"uts" : ISODate("2014-11-04T06:25:43.740Z")
}

＆＃34;蒙戈＆＃34;是一个搜索词，＆＃34; ts＆＃34;表示在网站上搜索的时间戳。

＆＃34; UTS＆＃34;表示上次搜索的时间。

所以搜索术语＆＃34; mongo＆＃34;在我们的网站上搜索了5次。

我需要在过去3个月内获得搜索量最高的50个项目。

我不是mongodb聚合方面的专家，但是我尝试这样的东西来获取过去3个月的数据： -

db.collection.aggregate({$group:{_id:"$_id",count:{$sum:1}}},{$match:{"log.ts":{"$gte":new Date("2014-09-01")}}})

它给了我错误： -

exception: sharded pipeline failed on shard DSink9: { errmsg: \"exception: aggregation result exceeds maximum document size (16MB)\", code: 16389

有人可以帮助我吗？

更新

我能够写一些查询。但它给了我语法错误。

db.collection.aggregate(
{$unwind:"$log"},
{$project:{log:"$log.ts"}},
{$match:{log:{"$gte" : new Date("2014-09-01"),"$lt" : new Date("2014-11-04")}}},
{$project:{_id:{val:{"$_id"}}}},
{$group:{_id:"$_id",sum:{$sum:1}}})

Answer 1

您在结果中超出了最大文档大小，但通常表示您“做错了”，特别是考虑到您搜索＆＃34; mongo＆＃34;在两个日期之间的存储数据中：

db.collection.aggregate([
   // Always match first, it reduces the workload and can use an index here only.
   { "$match": { 
       "_id": "mongo" 
       "log.ts": {
           "$gte": new Date("2014-09-01"), "$lt": new Date("2014-11-04")
       }
   }},

   // Unwind the array to de-normalize as documents
   { "$unwind": "$log" },

   // Get the count within the range, so match first to "filter"
   { "$match": { 
       "log.ts": {
           "$gte": new Date("2014-09-01"), "$lt": new Date("2014-11-04")
       }
   }},

   // Group the count on `_id`
   { "$group": {
       "_id": "$_id",
       "count": { "$sum": 1 }
   }}
]);

Answer 2

您的聚合结果超出了mongodb的最大大小。您可以使用allowDiskUse选项。此选项会阻止此操作。在mongodb shell version 2.6中，这不会引发异常。看看这个aggregrate。您可以优化查询以减少管道结果。为此，请查看此问题aggregation result

MongoDB中的Count和Aggregate

2 个答案: