我有一个指示位置变化的时间序列,例如:
08-09-2018 17:00:00, user_1, home
08-09-2018 18:30:00, user_2, home
08-09-2018 18:40:00, user_1, recreation center
我需要创建“存储桶”(在此示例中,可能是每15m),我需要用最后一个存储桶中的内容填充每个存储桶,如下所示:
08-09-2018 17:00:00, user_1, home
08-09-2018 17:15:00, user_1, home
08-09-2018 17:30:00, user_1, home
08-09-2018 17:45:00, user_1, home
08-09-2018 18:00:00, user_1, home
08-09-2018 18:15:00, user_1, home
08-09-2018 18:30:00, user_1, home
08-09-2018 18:30:00, user_2, home
08-09-2018 18:45:00, user_1, recreation center
08-09-2018 18:45:00, user_2, home
08-09-2018 19:00:00, user_1, recreation center
08-09-2018 19:00:00, user_2, home
从那里我将获得位置名称的伪数据 ..但是我知道该怎么做:)如果有帮助,请随意将其分组如下:
pd.crosstab([locationDf.date, locationDf.user], locationDf.location)
我该如何做第一部分?
我可以这样:
对于用户,位于locDf.groupby('user')中的user_loc_dc: user_loc_dc.resample('15T')。agg('max')。ffill()#只需附加这些
答案 0 :(得分:1)
使用pd.resample()
和ffill()
:
dates = [pd.Timestamp('08-09-2018 17:00:00'), pd.Timestamp('08-09-2018 18:30:00'), pd.Timestamp('08-09-2018 18:40:00'), pd.Timestamp('08-09-2018 19:00:00')]
data = [['user_1', 'home'], ['user_2', 'home'], ['user_1', 'recreation center'], ['user_2', 'home']]
resampled = pd.Series(data, dates).resample('15T').ffill()
收益:
2018-08-09 17:00:00 [user_1, home]
2018-08-09 17:15:00 [user_1, home]
2018-08-09 17:30:00 [user_1, home]
2018-08-09 17:45:00 [user_1, home]
2018-08-09 18:00:00 [user_1, home]
2018-08-09 18:15:00 [user_1, home]
2018-08-09 18:30:00 [user_2, home]
2018-08-09 18:45:00 [user_1, recreation center]
2018-08-09 19:00:00 [user_2, home]
Freq: 15T, dtype: object