这是一个我们可以解释为状态转换的时间序列:
states = list('abc')
transition_times = [
pd.to_datetime('00:01:00'),
pd.to_datetime('00:03:10'),
pd.to_datetime('00:05:00'),
]
df = pd.DataFrame({'state': states}, index=transition_times);df
Out:
state
2018-01-11 00:01:00 a
2018-01-11 00:03:10 b
2018-01-11 00:05:00 c
假设我现在想要从00:01:00到00:05:00的系统状态的分钟索引时间线。我的想法是:
df.resample('1min').ffill()
Out:
state
2018-01-11 00:01:00 a
2018-01-11 00:02:00 a
2018-01-11 00:03:00 a <- I would expect 'b' here !
2018-01-11 00:04:00 b
2018-01-11 00:05:00 c
如注释,有人可以解释为什么[3分钟,4分钟[bin用'a'填充?
我可以通过以下方式揭示我期望填补的空白:
df.resample('1min').max()
Out:
state
2018-01-11 00:01:00 a
2018-01-11 00:02:00 NaN
2018-01-11 00:03:00 b
2018-01-11 00:04:00 NaN
2018-01-11 00:05:00 c
通过以下方式获得所需结果:
df.resample('1min').max().fillna(method='ffill')
Out:
state
2018-01-11 00:01:00 a
2018-01-11 00:02:00 a
2018-01-11 00:03:00 b
2018-01-11 00:04:00 b
2018-01-11 00:05:00 c
非常感谢!
昆汀
答案 0 :(得分:3)
.ffill
(即向前填充)将使用最后一个观察来填充第一个项目的前进,直到它被新的观察替换。向前移动您的时间戳,在00:03:00
时间戳,“b”观察尚未发生,因此它使用最后一次有效观察(“a”)。
此方法的反面是.bfill
(回填),它以相反的顺序工作以向后填充,例如:
In: df.resample('1min').bfill()
Out:
state
2018-01-11 00:01:00 a
2018-01-11 00:02:00 b
2018-01-11 00:03:00 b
2018-01-11 00:04:00 c
2018-01-11 00:05:00 c
文档中的更多详细信息:pandas.DataFrame.fillna
答案 1 :(得分:1)
在这种情况下,前向填充并未真正覆盖现有值。
Pandas假设任何NaN
值应该等同于之前的值。原则上,00:03:00
确实是NaN
(我们没有条目),因此它正确填充了之前的值a
。
我相信这种困惑是最后一刻如何&#34;聚合&#34;或者发生下采样(即:在第3分钟组合期间如何发生多个值)。在这种情况下,采用精确分钟(00秒)的值
以下可能解释了这个过程:
我在您的数据框中添加了一个条目:
states = list('abcd')
transition_times = [
pd.to_datetime('00:01:00'),
pd.to_datetime('00:03:10'),
pd.to_datetime('00:03:30'),
pd.to_datetime('00:05:00'),
]
df = pd.DataFrame({'state': states}, index=transition_times);df
这是一个手册说明:
df.resample("10s").asfreq()
# state
# 2018-01-11 00:01:00 a
# 2018-01-11 00:01:10 NaN
# 2018-01-11 00:01:20 NaN
# 2018-01-11 00:01:30 NaN
# 2018-01-11 00:01:40 NaN
# 2018-01-11 00:01:50 NaN
# 2018-01-11 00:02:00 NaN
# 2018-01-11 00:02:10 NaN
# 2018-01-11 00:02:20 NaN
# 2018-01-11 00:02:30 NaN
# 2018-01-11 00:02:40 NaN
# 2018-01-11 00:02:50 NaN
# 2018-01-11 00:03:00 NaN
# 2018-01-11 00:03:10 b
# 2018-01-11 00:03:20 NaN
# 2018-01-11 00:03:30 c
# 2018-01-11 00:03:40 NaN
# 2018-01-11 00:03:50 NaN
# 2018-01-11 00:04:00 NaN
# 2018-01-11 00:04:10 NaN
# 2018-01-11 00:04:20 NaN
# 2018-01-11 00:04:30 NaN
# 2018-01-11 00:04:40 NaN
# 2018-01-11 00:04:50 NaN
# 2018-01-11 00:05:00 d
# Forward fill
df_ffill = df.resample("10s").asfreq().ffill()
# state
# 2018-01-11 00:01:00 a
# 2018-01-11 00:01:10 a
# 2018-01-11 00:01:20 a
# 2018-01-11 00:01:30 a
# 2018-01-11 00:01:40 a
# 2018-01-11 00:01:50 a
# 2018-01-11 00:02:00 a
# 2018-01-11 00:02:10 a
# 2018-01-11 00:02:20 a
# 2018-01-11 00:02:30 a
# 2018-01-11 00:02:40 a
# 2018-01-11 00:02:50 a
# 2018-01-11 00:03:00 a
# 2018-01-11 00:03:10 b
# 2018-01-11 00:03:20 b
# 2018-01-11 00:03:30 c
# 2018-01-11 00:03:40 c
# 2018-01-11 00:03:50 c
# 2018-01-11 00:04:00 c
# 2018-01-11 00:04:10 c
# 2018-01-11 00:04:20 c
# 2018-01-11 00:04:30 c
# 2018-01-11 00:04:40 c
# 2018-01-11 00:04:50 c
# 2018-01-11 00:05:00 d
# Manual Downsample
df_ffill[df_ffill.index.second == 0]
# state
# 2018-01-11 00:01:00 a
# 2018-01-11 00:02:00 a
# 2018-01-11 00:03:00 a
# 2018-01-11 00:04:00 c
# 2018-01-11 00:05:00 d
# ----------------------------------------
df.resample("1min").ffill()
# state
# 2018-01-11 00:01:00 a
# 2018-01-11 00:02:00 a
# 2018-01-11 00:03:00 a
# 2018-01-11 00:04:00 c
# 2018-01-11 00:05:00 d