用于计算百分比的 MongoDB 查询

时间:2021-03-20 10:41:26

标签: mongodb pymongo percentage

我是 MongoDB 的新手,有点卡在这个查询上。任何帮助/指导将不胜感激。我无法以所需的方式计算百分比。我的管道有问题,百分比的先决条件没有正确计算。下面我提供了我的失败尝试以及所需的输出。

集合中的单个条目如下所示:

_id : ObjectId("602fb382f060fff5419fd0d1")
time : "2019/05/02 00:00:00"
station_id : 3544
station_name : "Underhill Ave &; Pacific St"
station_status : "In Service"
latitude : 40.6804836
longitude : -73.9646795
zipcode : 11238
borough : "Brooklyn"
neighbourhood : "Prospect Heights"
available_bikes : 5
available_docks : 21

我要解决的查询是:

Given a station_id (e.g., 522) and a num_hours (e.g., 3) passed as parameters:

 - Consider only the measurements where the station_status = “In Service”. 
 - Consider only the measurements for that concrete
   “station_id”. 
 - Compute the percentage of measurements with
   available_bikes = 0 for each hour of the day (e.g., for the period
   [8am, 9am) the percentage is 15.06% and for the period [9am, 10am)
   the percentage is
   27.32%). 
 - Sort the percentage results in decreasing order. 
 - Return the top “num_hours” documents.

所需的输出是:

--- DOCUMENT 0 INFO ---
---------------------------------
hour : 19
percentage : 65.37
total_measurements : 283
zero_bikes_measurements : 185
---------------------------------
--- DOCUMENT 1 INFO ---
---------------------------------
hour : 21
percentage : 64.79
total_measurements : 284
zero_bikes_measurements : 184
---------------------------------
--- DOCUMENT 2 INFO ---
---------------------------------
hour : 00
percentage : 63.73
total_measurements : 284
zero_bikes_measurements : 181

我的尝试是:

 command_1 = {"$match": {"station_status": "In Service", "station_id": station_id, "available_bikes": 0}}
    my_query.append(command_1)

    command_2 = {"$group": {"_id": "null", "total_measurements": {"$sum": 1}}}
    my_query.append(command_2)  

    command_3 = {"$project": {"_id": 0,
                              "station_id": 1,
                              "station_status": 1,
                              "hour": {"$substr": ["$time", 11, 2]},
                              "available_bikes": 1,
                              "total_measurements": {"$sum": 1}
                              }
                 }
    my_query.append(command_3)

    command_4 = {"$group": {"_id": "$hour", "zero_bikes_measurements": {"$sum": 1}}}
    my_query.append(command_4)

    command_5 = {"$project": {"percent": {
                                  "$multiply": [{"$divide": ["$total_measurements", "$zero_bikes_measurements"]},
                                                100]}}}

    my_query.append(command_5)

1 个答案:

答案 0 :(得分:0)

我已经看过这个,我将提供一些真诚的建议:

不要尝试在聚合查询中执行此操作。只需回归基础并使用 find() 提取数字,然后在 python 中计算数字。

如果您想坚持使用聚合查询,我会说您的 match 命令过滤 available_bikes 等于 0。您永远无法获得测量的总数,因此您永远无法找到百分比。此外,当您完成第一个 $group 时,您“丢失”了您的投影,因此在管道中的那个点您只有 total_measurements,仅此而已(注释掉命令 3 到 5 以了解我的意思).

相关问题