我有一个带时间戳列的pandas数据框。为论证起见,我们可以假设第一行和第二行的时间戳之间的距离始终是正确的距离,而所有其他行的间隔应相等。我想知道如何从第二行到最后一行为每个丢失的时间步添加空白行,从而能够连续添加多个丢失的行。我想保留我已经拥有的所有数据,只添加空白行,其中缺少时间戳,除了时间戳外,所有列均应为空。
答案 0 :(得分:0)
我认为像这样的代码片段会起作用:
import pandas as pd
import numpy as np
# Let's create a sample dataframe
df = pd.DataFrame(columns=['ts','Col2','Col3'])
df.ts = [pd.Timestamp('2017-01-24 13:03:12.000000'), pd.Timestamp('2017-01-24 13:50:12.000000'), pd.Timestamp('2017-01-24 16:07:40.000000')]
diff = df.ts.iloc[1]-df.ts.iloc[0] #the difference between first and second timestamps
print(df) #Dataframe before changes
index = 2
n_diff = df.ts.iloc[index]-df.ts.iloc[index-1]
while(n_diff>diff):
new_ts = df.ts.iloc[index-1]+diff
row = pd.DataFrame({"ts": new_ts, "Col2": np.nan, "Col3": np.nan }, index=[index+1])
df = pd.concat([df.iloc[:index], row, df.iloc[index:]]).reset_index(drop=True)
index+=1
n_diff = df.ts.iloc[index]-df.ts.iloc[index-1]
print(df) #Dataframe after adding new rows
输出为:
ts Col2 Col3
0 2017-01-24 13:03:12 NaN NaN
1 2017-01-24 13:50:12 NaN NaN
2 2017-01-24 16:07:40 NaN NaN
ts Col2 Col3
0 2017-01-24 13:03:12 NaN NaN
1 2017-01-24 13:50:12 NaN NaN
2 2017-01-24 14:37:12 NaN NaN
3 2017-01-24 15:24:12 NaN NaN
4 2017-01-24 16:07:40 NaN NaN