Question

我想使用插值函数，但只能在pandas DataFrame列中的已知数据值之间使用。问题是列中的第一个和最后一个值通常是NaN，有时在值不是NaN之前可能有很多行：

      col 1    col 2
 0    NaN      NaN
 1    NaN      NaN
...
1000   1       NaN
1001  NaN       1   <-----
1002   3       NaN  <----- only want to fill in these 'in between value' rows
1003   4        3
...
3999  NaN      NaN
4000  NaN      NaN

我正在将一个数据集绑在一起，该数据集会在事件＆＃39;上更新。但是每个列分别，并通过Timestamp索引。这意味着通常有些行没有为某些列记录数据，因此有很多NaN！

Answer 1

我按功能idxmin和idxmax按min和max列选择，并使用方法前向填充功能fillna。

print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001    NaN      1
#1002      3    NaN
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()] = df.loc[df['col 1'].idxmin(): df['col 1'].idxmax()].fillna(method='ffill')
df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()] = df.loc[df['col 2'].idxmin(): df['col 2'].idxmax()].fillna(method='ffill')
print df
#      col 1  col 2
#0       NaN    NaN
#1       NaN    NaN
#1000      1    NaN
#1001      1      1
#1002      3      1
#1003      4      3
#3999    NaN    NaN
#4000    NaN    NaN

添加了不同的解决方案，感谢HStro。

df['col 1'].loc[df['col 1'].first_valid_index() : df['col 1'].last_valid_index()] = df['col 1'].loc[df['col 1'].first_valid_index(): df['col 1'].last_valid_index()].astype(float).interpolate()

Pandas：插值，列中的第一个和最后一个数据点是NaN

1 个答案: