什么应该是mongoDb查询来查找出现次数?

时间:2015-12-23 04:20:18

标签: php mongodb

collection

中的示例记录

(doc 1)

[{
   "_id": ObjectId("567941aaf0058ed6755ab3dc"),
   "hash_count": NumberInt(7),
   "time": [
     NumberInt(1450787170),
     NumberInt(1450787292),
     NumberInt(1450787307),
     NumberInt(1450787333),
     NumberInt(1450787615) 
  ],
   "word": "batman" 
},

(doc 2)

   {
       "_id": ObjectId("567941aaf0058ed6755ab3dc"),
       "hash_count": NumberInt(7),

   "time": [
     NumberInt(1450787170),
     NumberInt(1450787292),
     NumberInt(1450787307),
     NumberInt(1450787333),
     NumberInt(1450787354),
     NumberInt(1450787526),
     NumberInt(1450787615) 
  ],
   "word": "apple" 
}]

使用PHP存储, 我想查找时间(1450787307)和(1450787615

之间的记录数

答案:

apple=5
batman=3 

应该查询什么?

我跑了这个命令

{
aggregate : "hashtags",       
pipeline:

[
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$unwind:"$time"},
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$group:{"_id":"$word","count":{$sum:1}}}
]

}

给出了这个结果

Response from server:
{
   "result": [

  ],
   "ok": 1 
}

3 个答案:

答案 0 :(得分:3)

由于您使用旧版mongoDB,因此无法充分利用3.2中引入的array aggregation operators的强大功能。

您必须汇总如下:

db.collection.aggregate([
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$unwind:"$time"},
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$group:{"_id":"$word","count":{$sum:1}}}
])

翻译为PHP

$result = $c->aggregate([
[ '$match' => [ 'time' => [ '$gte' => NumberInt(1450787307), 
                            '$lte' => NumberInt(1450787615) ] ] ],
[ '$unwind' => '$time' ],
[ '$match' => [ 'time' => [ '$gte' => NumberInt(1450787307), 
                            '$lte' => NumberInt(1450787615) ] ] ],
[ '$group' => [ '_id' => '$word', 'count' => [ '$sum' => 1 ] ] ]
]);

在版本3.2中,您可以使用$filter$size的组合来获得相同的结果并降低运营成本。

db.collection.aggregate([
{$match:{"time":{$gte:NumberInt(1450787307), 
                 $lte:NumberInt(1450787615)}}},
{$project:{"_id":0,"word":1,
           "count":{$size:{$filter:
                               {"input":"$time",
                                "as":"t",
                                "cond":{$and:[
                                     {$gte:["$$t",NumberInt(1450787307)]},
                                     {$lte:["$$t",NumberInt(1450787615)]}]}
                                }
                           }
                    }
}}
])

答案 1 :(得分:1)

好吧,经过多次尝试后我得到了这个答案并且是正确的 对于 1450787615-下限 1450855155-上限

db.hashtags.aggregate([
    {
        "$match": {
            "time": {
                "$gte": 1450787615, "$lte": 1450855155  
            }
        }
    },
    { "$unwind": "$time" },
    {
        "$match": {
            "time": {
                "$gte": 1450787615, "$lte": 1450855155  
            }
        }
    },
    {
        "$group": {
            "_id": "$word",
            "count": {
                "$sum": 1
            }
        }
    }
])
答案就像是

{
    "result" : [ 
        {
            "_id" : "batman",
            "count" : 3
        }, 
        {
            "_id" : "dear",
            "count" : 1
        }, 
        {
            "_id" : "ghost",
            "count" : 1
        }
    ],
    "ok" : 1
}

答案 2 :(得分:0)

db.collection.find({time:{$gt: 1450787307, $lt: 1450787615}}); 

这将首先为您提供适合您指定时间范围内的所有文档的光标。完成后,您可以遍历游标并打印出名称以及一些循环逻辑,以查找每个游标的出现次数。我只是轻松地使用mongodb,所以可能有更有效的方法来做到这一点。

参考: https://docs.mongodb.org/v3.0/reference/method/db.collection.find/