我使用Pandas处理大量的时间序列数据集。如果两个连续索引之间的差异大于5,我想在数据帧的行之间添加行。
实际:
a result
Date
1497544649 1 1.0
1497544652 9 1.0
1497544661 9 NaN
预期:
a result
Date
1497544649 1 1.0
1497544652 9 1.0
1497544657 9 0
1497544661 9 NaN
我在索引上使用 diff()来获取两个连续索引之间的差异但是如果差异大于5则不确定如何插入记录。
import pandas as pd
df = pd.DataFrame([{"Date": 1497544649,"a":1, "result": 1},
{"Date": 1497544652,"a": 9, "result": 1},
{"Date": 1497544661,"a": 9, "result": 1}])
df.set_index("Date", inplace=True)
df.index.to_series().diff().fillna(0).to_frame("diff")
有关如何实现这一目标的任何指示都将受到赞赏
谢谢
答案 0 :(得分:0)
你有一个良好的开端。添加diff列以便于过滤。
获取与您的规则匹配的数据框的索引并插入您的行。
df['diff'] = df.index.to_series().diff().fillna(0).to_frame("diff")
matches = df[df['diff'] > 5].index.tolist()
for i in matches:
diff = df.loc[i]['diff']
interval = round(diff/2) # index some place in the middle
df.loc[i-interval] = [0, 0, 0, diff-interval] # insert row before matched index
df.loc[i]['diff'] = interval # may not need to update the interval
df.sort_index(inplace=False) # pandas appends by default so we should sort this
del df.diff # we can remove this