我有这个数据框:
date station count
2015-01-01 13:00:00 A 4
2015-01-01 14:00:00 B 2
2015-01-02 15:00:00 A 7
为简单起见,假装该电台只有2个值:A&乙
我的目标是为每个日期,每小时和每个电台生成0个计数。
例如,代码将生成:
date station count
2015-01-01 00:00:00 A 0
2015-01-01 00:00:00 B 0
这就是我的尝试:
# generate 0 values (no transaction) for each hour at each station
df_trans = df_trans.set_index(['date', 'station'])
(date_index, station_index) = df_trans.index.levels
# generate a range of all dates & hours
all_dates = pd.date_range('2014-01-09', '2015-12-08', freq='H')
new_index = pd.MultiIndex.from_product([all_dates, station_index])
df_trans = df_trans.reindex(new_index)
df_trans = df_trans['net_rate'].fillna(0)
但结果数据框不是每小时一次。
输出(日期中没有小时):
net_rate
2014-01-09 2 0.0
3 0.0
4 0.0
答案 0 :(得分:1)
对我而言,它工作得很好,小改进是在reindex
中使用参数fill_value=0
:
new_index = pd.MultiIndex.from_product([all_dates, station_index], names=('date', 'station'))
df_trans = df_trans.reindex(new_index, fill_value=0)
print (df_trans.head(10))
count
date station
2014-01-09 00:00:00 A 0
B 0
2014-01-09 01:00:00 A 0
B 0
2014-01-09 02:00:00 A 0
B 0
2014-01-09 03:00:00 A 0
B 0
2014-01-09 04:00:00 A 0
B 0
print (df_trans[df_trans['count'] != 0])
count
date station
2015-01-01 13:00:00 A 4
2015-01-01 14:00:00 B 2
2015-01-02 15:00:00 A 7
print (df_trans.index.levels)
[[2014-01-09 00:00:00, 2014-01-09 01:00:00, 2014-01-09 02:00:00, 2014-01-09 03:00:00,
2014-01-09 04:00:00, 2014-01-09 05:00:00, 2014-01-09 06:00:00, 2014-01-09 07:00:00,
2014-01-09 08:00:00, 2014-01-09 09:00:00, 2014-01-09 10:00:00, 2014-01-09 11:00:00,
2014-01-09 12:00:00, 2014-01-09 13:00:00, 2014-01-09 14:00:00, 2014-01-09 15:00:00,
2014-01-09 16:00:00, 2014-01-09 17:00:00, 2014-01-09 18:00:00, 2014-01-09 19:00:00,
2014-01-09 20:00:00, 2014-01-09 21:00:00, 2014-01-09 22:00:00, 2014-01-09 23:00:00,
2014-01-10 00:00:00, 2014-01-10 01:00:00, 2014-01-10 02:00:00, 2014-01-10 03:00:00,
2014-01-10 04:00:00, 2014-01-10 05:00:00, 2014-01-10 06:00:00, 2014-01-10 07:00:00,
2014-01-10 08:00:00, 2014-01-10 09:00:00, 2014-01-10 10:00:00, 2014-01-10 11:00:00,
2014-01-10 12:00:00, 2014-01-10 13:00:00, 2014-01-10 14:00:00, 2014-01-10 15:00:00,
2014-01-10 16:00:00, 2014-01-10 17:00:00, 2014-01-10 18:00:00, 2014-01-10 19:00:00,
2014-01-10 20:00:00, 2014-01-10 21:00:00, 2014-01-10 22:00:00, 2014-01-10 23:00:00,
2014-01-11 00:00:00, 2014-01-11 01:00:00, 2014-01-11 02:00:00, 2014-01-11 03:00:00,
2014-01-11 04:00:00, 2014-01-11 05:00:00, 2014-01-11 06:00:00, 2014-01-11 07:00:00,
2014-01-11 08:00:00, 2014-01-11 09:00:00, 2014-01-11 10:00:00, 2014-01-11 11:00:00,
2014-01-11 12:00:00, 2014-01-11 13:00:00, 2014-01-11 14:00:00, 2014-01-11 15:00:00,
2014-01-11 16:00:00, 2014-01-11 17:00:00, 2014-01-11 18:00:00, 2014-01-11 19:00:00,
2014-01-11 20:00:00, 2014-01-11 21:00:00, 2014-01-11 22:00:00, 2014-01-11 23:00:00,
2014-01-12 00:00:00, 2014-01-12 01:00:00, 2014-01-12 02:00:00, 2014-01-12 03:00:00,
2014-01-12 04:00:00, 2014-01-12 05:00:00, 2014-01-12 06:00:00, 2014-01-12 07:00:00,
2014-01-12 08:00:00, 2014-01-12 09:00:00, 2014-01-12 10:00:00, 2014-01-12 11:00:00,
2014-01-12 12:00:00, 2014-01-12 13:00:00, 2014-01-12 14:00:00, 2014-01-12 15:00:00,
2014-01-12 16:00:00, 2014-01-12 17:00:00, 2014-01-12 18:00:00, 2014-01-12 19:00:00,
2014-01-12 20:00:00, 2014-01-12 21:00:00, 2014-01-12 22:00:00, 2014-01-12 23:00:00,
2014-01-13 00:00:00, 2014-01-13 01:00:00, 2014-01-13 02:00:00, 2014-01-13 03:00:00, ...], ['A', 'B']]