为多个用户填写缺少的时隙

时间:2019-05-28 10:01:23

标签: python pandas

我正在尝试将日期和时间作为字符串的CSV文件中缺少的位置。

我的输入是:

time_slot              User       Location
2017-10-26 00:00:00      1           156 
2017-10-26 10:00:00      1           55
2017-10-26 12:00:00      1           848
2017-10-27 02:00:00      1           79
2017-10-27 16:00:00      1           846
2017-10-27 23:00:00      1           648
2017-10-26 00:00:00      2           75
2017-10-26 02:00:00      2           32
2017-10-26 10:00:00      2           18
2017-10-27 01:00:00      2           874
2017-10-27 04:00:00      2           46
2017-10-27 18:00:00      2           96
2017-10-26 07:00:00      3           25
2017-10-26 09:00:00      3           463
2017-10-26 14:00:00      3           85
2017-10-27 06:00:00      3           95
2017-10-27 23:00:00      3           12

输出应为

time_slot              User       Location
2017-10-26 00:00:00      1           156
.
.
.
.
2017-10-26 09:00:00      1           156 
2017-10-26 10:00:00      1           55
2017-10-26 11:00:00      1           55
2017-10-26 12:00:00      1           848
.
.          848 for all slots in between
.
2017-10-26 24:00:00      1           848
.
.           848

2017-10-27 02:00:00      1           79
.
.          79
.

2017-10-27 16:00:00      1           846

             846
Same as above

2017-10-27 23:00:00      1           648
2017-10-26 00:00:00      2           75
2017-10-26 02:00:00      2           32
2017-10-26 10:00:00      2           18
2017-10-27 01:00:00      2           874
2017-10-27 04:00:00      2           46
2017-10-27 18:00:00      2           96
2017-10-26 07:00:00      3           25
2017-10-26 09:00:00      3           463
2017-10-26 14:00:00      3           85
2017-10-27 06:00:00      3           95
2017-10-27 23:00:00      3           12

日期时间频率为1小时。我们没有填充丢失的时隙中的0,而是填充了先前时隙的位置点

1 个答案:

答案 0 :(得分:2)

DataFrame.asfreq中使用DataFrame.groupby

df1 = (df.groupby('User')['Location']
        .apply(lambda x: x.asfreq(freq='H',method='ffill'))
        .reset_index())
print (df1.head(10))
   User           time_slot  Location
0     1 2017-10-26 00:00:00       156
1     1 2017-10-26 01:00:00       156
2     1 2017-10-26 02:00:00       156
3     1 2017-10-26 03:00:00       156
4     1 2017-10-26 04:00:00       156
5     1 2017-10-26 05:00:00       156
6     1 2017-10-26 06:00:00       156
7     1 2017-10-26 07:00:00       156
8     1 2017-10-26 08:00:00       156
9     1 2017-10-26 09:00:00       156

详细信息

print (df.index)
DatetimeIndex(['2017-10-26 00:00:00', '2017-10-26 10:00:00',
               '2017-10-26 12:00:00', '2017-10-27 02:00:00',
               '2017-10-27 16:00:00', '2017-10-27 23:00:00',
               '2017-10-26 00:00:00', '2017-10-26 02:00:00',
               '2017-10-26 10:00:00', '2017-10-27 01:00:00',
               '2017-10-27 04:00:00', '2017-10-27 18:00:00',
               '2017-10-26 07:00:00', '2017-10-26 09:00:00',
               '2017-10-26 14:00:00', '2017-10-27 06:00:00',
               '2017-10-27 23:00:00'],
              dtype='datetime64[ns]', name='time_slot', freq=None)