在Pandas数据框中添加缺少的时间步

时间:2019-11-26 22:15:05

标签: python pandas

我有一个带时间戳列的pandas数据框。为论证起见,我们可以假设第一行和第二行的时间戳之间的距离始终是正确的距离,而所有其他行的间隔应相等。我想知道如何从第二行到最后一行为每个丢失的时间步添加空白行,从而能够连续添加多个丢失的行。我想保留我已经拥有的所有数据,只添加空白行,其中缺少时间戳,除了时间戳外,所有列均应为空。

1 个答案:

答案 0 :(得分:0)

我认为像这样的代码片段会起作用:

import pandas as pd
import numpy as np
# Let's create a sample dataframe
df = pd.DataFrame(columns=['ts','Col2','Col3'])
df.ts = [pd.Timestamp('2017-01-24 13:03:12.000000'), pd.Timestamp('2017-01-24 13:50:12.000000'), pd.Timestamp('2017-01-24 16:07:40.000000')]
diff = df.ts.iloc[1]-df.ts.iloc[0] #the difference between first and second timestamps

print(df) #Dataframe before changes

index = 2
n_diff = df.ts.iloc[index]-df.ts.iloc[index-1]

while(n_diff>diff):
    new_ts = df.ts.iloc[index-1]+diff
    row = pd.DataFrame({"ts": new_ts, "Col2": np.nan, "Col3": np.nan }, index=[index+1])
    df = pd.concat([df.iloc[:index], row, df.iloc[index:]]).reset_index(drop=True)
    index+=1
    n_diff = df.ts.iloc[index]-df.ts.iloc[index-1]

print(df) #Dataframe after adding new rows

输出为:

                   ts Col2 Col3
0 2017-01-24 13:03:12  NaN  NaN
1 2017-01-24 13:50:12  NaN  NaN
2 2017-01-24 16:07:40  NaN  NaN

                   ts Col2 Col3
0 2017-01-24 13:03:12  NaN  NaN
1 2017-01-24 13:50:12  NaN  NaN
2 2017-01-24 14:37:12  NaN  NaN
3 2017-01-24 15:24:12  NaN  NaN
4 2017-01-24 16:07:40  NaN  NaN