熊猫产生缺失的日期& 0小时的小时数

时间:2017-11-20 13:52:54

标签: python pandas dataframe time-series

我有这个数据框:

date                   station  count
2015-01-01 13:00:00      A        4
2015-01-01 14:00:00      B        2
2015-01-02 15:00:00      A        7

为简单起见,假装该电台只有2个值:A&乙

我的目标是为每个日期,每小时和每个电台生成0个计数。

例如,代码将生成:

date                   station  count
2015-01-01 00:00:00      A        0
2015-01-01 00:00:00      B        0

这就是我的尝试:

# generate 0 values (no transaction) for each hour at each station
df_trans = df_trans.set_index(['date', 'station'])

(date_index, station_index) = df_trans.index.levels

# generate a range of all dates & hours
all_dates = pd.date_range('2014-01-09', '2015-12-08', freq='H')

new_index = pd.MultiIndex.from_product([all_dates, station_index])

df_trans = df_trans.reindex(new_index)

df_trans = df_trans['net_rate'].fillna(0)

但结果数据框不是每小时一次。

输出(日期中没有小时):

               net_rate
2014-01-09 2        0.0
           3        0.0
           4        0.0

1 个答案:

答案 0 :(得分:1)

对我而言,它工作得很好,小改进是在reindex中使用参数fill_value=0

new_index = pd.MultiIndex.from_product([all_dates, station_index], names=('date', 'station'))

df_trans = df_trans.reindex(new_index, fill_value=0)

print (df_trans.head(10))
                             count
date                station       
2014-01-09 00:00:00 A            0
                    B            0
2014-01-09 01:00:00 A            0
                    B            0
2014-01-09 02:00:00 A            0
                    B            0
2014-01-09 03:00:00 A            0
                    B            0
2014-01-09 04:00:00 A            0
                    B            0

print (df_trans[df_trans['count'] != 0])
                             count
date                station       
2015-01-01 13:00:00 A            4
2015-01-01 14:00:00 B            2
2015-01-02 15:00:00 A            7
print (df_trans.index.levels)

[[2014-01-09 00:00:00, 2014-01-09 01:00:00, 2014-01-09 02:00:00, 2014-01-09 03:00:00, 
  2014-01-09 04:00:00, 2014-01-09 05:00:00, 2014-01-09 06:00:00, 2014-01-09 07:00:00, 
  2014-01-09 08:00:00, 2014-01-09 09:00:00, 2014-01-09 10:00:00, 2014-01-09 11:00:00, 
  2014-01-09 12:00:00, 2014-01-09 13:00:00, 2014-01-09 14:00:00, 2014-01-09 15:00:00, 
  2014-01-09 16:00:00, 2014-01-09 17:00:00, 2014-01-09 18:00:00, 2014-01-09 19:00:00, 
  2014-01-09 20:00:00, 2014-01-09 21:00:00, 2014-01-09 22:00:00, 2014-01-09 23:00:00, 
  2014-01-10 00:00:00, 2014-01-10 01:00:00, 2014-01-10 02:00:00, 2014-01-10 03:00:00, 
  2014-01-10 04:00:00, 2014-01-10 05:00:00, 2014-01-10 06:00:00, 2014-01-10 07:00:00, 
  2014-01-10 08:00:00, 2014-01-10 09:00:00, 2014-01-10 10:00:00, 2014-01-10 11:00:00, 
  2014-01-10 12:00:00, 2014-01-10 13:00:00, 2014-01-10 14:00:00, 2014-01-10 15:00:00, 
  2014-01-10 16:00:00, 2014-01-10 17:00:00, 2014-01-10 18:00:00, 2014-01-10 19:00:00, 
  2014-01-10 20:00:00, 2014-01-10 21:00:00, 2014-01-10 22:00:00, 2014-01-10 23:00:00, 
  2014-01-11 00:00:00, 2014-01-11 01:00:00, 2014-01-11 02:00:00, 2014-01-11 03:00:00, 
  2014-01-11 04:00:00, 2014-01-11 05:00:00, 2014-01-11 06:00:00, 2014-01-11 07:00:00, 
  2014-01-11 08:00:00, 2014-01-11 09:00:00, 2014-01-11 10:00:00, 2014-01-11 11:00:00, 
  2014-01-11 12:00:00, 2014-01-11 13:00:00, 2014-01-11 14:00:00, 2014-01-11 15:00:00, 
  2014-01-11 16:00:00, 2014-01-11 17:00:00, 2014-01-11 18:00:00, 2014-01-11 19:00:00, 
  2014-01-11 20:00:00, 2014-01-11 21:00:00, 2014-01-11 22:00:00, 2014-01-11 23:00:00, 
  2014-01-12 00:00:00, 2014-01-12 01:00:00, 2014-01-12 02:00:00, 2014-01-12 03:00:00, 
  2014-01-12 04:00:00, 2014-01-12 05:00:00, 2014-01-12 06:00:00, 2014-01-12 07:00:00, 
  2014-01-12 08:00:00, 2014-01-12 09:00:00, 2014-01-12 10:00:00, 2014-01-12 11:00:00, 
  2014-01-12 12:00:00, 2014-01-12 13:00:00, 2014-01-12 14:00:00, 2014-01-12 15:00:00, 
  2014-01-12 16:00:00, 2014-01-12 17:00:00, 2014-01-12 18:00:00, 2014-01-12 19:00:00, 
  2014-01-12 20:00:00, 2014-01-12 21:00:00, 2014-01-12 22:00:00, 2014-01-12 23:00:00, 
  2014-01-13 00:00:00, 2014-01-13 01:00:00, 2014-01-13 02:00:00, 2014-01-13 03:00:00, ...], ['A', 'B']]