我正在尝试将日期和时间作为字符串的CSV文件中缺少的位置。
我的输入是:
time_slot User Location
2017-10-26 00:00:00 1 156
2017-10-26 10:00:00 1 55
2017-10-26 12:00:00 1 848
2017-10-27 02:00:00 1 79
2017-10-27 16:00:00 1 846
2017-10-27 23:00:00 1 648
2017-10-26 00:00:00 2 75
2017-10-26 02:00:00 2 32
2017-10-26 10:00:00 2 18
2017-10-27 01:00:00 2 874
2017-10-27 04:00:00 2 46
2017-10-27 18:00:00 2 96
2017-10-26 07:00:00 3 25
2017-10-26 09:00:00 3 463
2017-10-26 14:00:00 3 85
2017-10-27 06:00:00 3 95
2017-10-27 23:00:00 3 12
输出应为
time_slot User Location
2017-10-26 00:00:00 1 156
.
.
.
.
2017-10-26 09:00:00 1 156
2017-10-26 10:00:00 1 55
2017-10-26 11:00:00 1 55
2017-10-26 12:00:00 1 848
.
. 848 for all slots in between
.
2017-10-26 24:00:00 1 848
.
. 848
2017-10-27 02:00:00 1 79
.
. 79
.
2017-10-27 16:00:00 1 846
846
Same as above
2017-10-27 23:00:00 1 648
2017-10-26 00:00:00 2 75
2017-10-26 02:00:00 2 32
2017-10-26 10:00:00 2 18
2017-10-27 01:00:00 2 874
2017-10-27 04:00:00 2 46
2017-10-27 18:00:00 2 96
2017-10-26 07:00:00 3 25
2017-10-26 09:00:00 3 463
2017-10-26 14:00:00 3 85
2017-10-27 06:00:00 3 95
2017-10-27 23:00:00 3 12
日期时间频率为1小时。我们没有填充丢失的时隙中的0,而是填充了先前时隙的位置点
答案 0 :(得分:2)
在DataFrame.asfreq
中使用DataFrame.groupby
:
df1 = (df.groupby('User')['Location']
.apply(lambda x: x.asfreq(freq='H',method='ffill'))
.reset_index())
print (df1.head(10))
User time_slot Location
0 1 2017-10-26 00:00:00 156
1 1 2017-10-26 01:00:00 156
2 1 2017-10-26 02:00:00 156
3 1 2017-10-26 03:00:00 156
4 1 2017-10-26 04:00:00 156
5 1 2017-10-26 05:00:00 156
6 1 2017-10-26 06:00:00 156
7 1 2017-10-26 07:00:00 156
8 1 2017-10-26 08:00:00 156
9 1 2017-10-26 09:00:00 156
详细信息:
print (df.index)
DatetimeIndex(['2017-10-26 00:00:00', '2017-10-26 10:00:00',
'2017-10-26 12:00:00', '2017-10-27 02:00:00',
'2017-10-27 16:00:00', '2017-10-27 23:00:00',
'2017-10-26 00:00:00', '2017-10-26 02:00:00',
'2017-10-26 10:00:00', '2017-10-27 01:00:00',
'2017-10-27 04:00:00', '2017-10-27 18:00:00',
'2017-10-26 07:00:00', '2017-10-26 09:00:00',
'2017-10-26 14:00:00', '2017-10-27 06:00:00',
'2017-10-27 23:00:00'],
dtype='datetime64[ns]', name='time_slot', freq=None)