Question

我使用Pandas处理大量的时间序列数据集。如果两个连续索引之间的差异大于5，我想在数据帧的行之间添加行。

实际：

            a  result
Date                 
1497544649  1     1.0
1497544652  9     1.0
1497544661  9     NaN

预期：

            a  result
Date                 
1497544649  1     1.0
1497544652  9     1.0
1497544657  9     0
1497544661  9     NaN

我在索引上使用 diff（）来获取两个连续索引之间的差异但是如果差异大于5则不确定如何插入记录。

import pandas as pd

df = pd.DataFrame([{"Date": 1497544649,"a":1, "result": 1}, 
                   {"Date": 1497544652,"a": 9, "result": 1},
                   {"Date": 1497544661,"a": 9, "result": 1}])
df.set_index("Date", inplace=True)

df.index.to_series().diff().fillna(0).to_frame("diff")

有关如何实现这一目标的任何指示都将受到赞赏

谢谢

Answer 1

你有一个良好的开端。添加diff列以便于过滤。

获取与您的规则匹配的数据框的索引并插入您的行。

df['diff'] = df.index.to_series().diff().fillna(0).to_frame("diff")

matches = df[df['diff'] > 5].index.tolist()


for i in matches:
    diff = df.loc[i]['diff']
    interval = round(diff/2) # index some place in the middle
    df.loc[i-interval] = [0, 0, 0, diff-interval] # insert row before matched index
    df.loc[i]['diff'] = interval # may not need to update the interval

df.sort_index(inplace=False) # pandas appends by default so we should sort this

del df.diff # we can remove this

根据条件在Pandas Dataframe中插入行

1 个答案: