使用citibike数据:https://s3.amazonaws.com/tripdata/index.html
tripduration starttime stoptime start_station_id start_station_name start_station_latitude start_station_longitude end_station_id end_station_name end_station_latitude end_station_longitude bikeid usertype birth_year gender
461 2016-02-01 00:00:08 2016-02-01 00:07:49 480 W 53 St & 10 Ave 40.766697 -73.990617 524 W 43 St & 6 Ave 40.755273 -73.983169 23292 Subscriber 1966.0 1
297 2016-02-01 00:00:56 2016-02-01 00:05:53 463 9 Ave & W 16 St 40.742065 -74.004432 380 W 4 St & 7 Ave S 40.734011 -74.002939 15329 Subscriber 1977.0 1
280 2016-02-01 00:01:00 2016-02-01 00:05:40 3134 3 Ave & E 62 St 40.763126 -73.965269 3141 1 Ave & E 68 St 40.765005 -73.958185 22927 Subscriber 1987.0 1
使用Groupby函数按小时分组,我想将空值包含为零。
我使用了以下代码:
bikes_parked = df.groupby(['end_station_name',pd.Grouper(key='stoptime',freq='H')]).size().reset_index()
bikes_parked.rename(columns={0: 'bikes_parked'},inplace=True)
按小时计算停放的自行车数量,但是没有数据的小时数会跳过。
输出:
end_station_name stoptime bikes_parked
0 1 Ave & E 15 St 2016-02-01 00:00:00 1
1 1 Ave & E 15 St 2016-02-01 05:00:00 1
2 1 Ave & E 15 St 2016-02-01 06:00:00 3
我想包括停止时间01,02,03,04,bikes_parked也是0。
答案 0 :(得分:0)
正如评论中所提到的,解决方案是这样的:
1)创建一个包含整个小时范围的DataFrame,全部设置为bikes_parked=0
2)使用以下方法使用分组表中的相关数据更新此DF:
df.loc[bikes_parked.index, 'bikes_parked'] = bikes_parked.bikes_parked