Question

我有df喜欢

d = {'col1': [np.nan, np.nan, 1],
     'col2': [1, 1, 2],
     'col3': [2, 2, 3],
     'col4': [np.nan, 3, np.nan]}
df = pd.DataFrame(data=d)

并希望对行进行推断以填充任何尾随的nan。

预期产出：

d2 = {'col1': [np.nan, np.nan, 1],
      'col2': [1, 1, 2],
      'col3': [2, 2, 3],
      'col4': [3, 3, 4]}
df2 = pd.DataFrame(data=d2)

编辑：每行的斜率不同。我试过了df.interpolate(method='linear')，但这给了我跟踪nan s

的趋势

Answer 1

pandas.interpolate，主要是scipy插值函数的包装器，有许多关键字可以让您调整插值。您可以使用spline：

d = {'col1': [np.nan, np.nan, 1, 5, 9, np.nan],
     'col2': [1, 1, 2, 5, 8, np.nan],
     'col3': [2, 2, 3, 4, 5, np.nan],
     'col4': [np.nan, 3, np.nan, 5, 6, np.nan]}
df = pd.DataFrame(data=d)

df = df.interpolate(method = "spline", order = 1, limit_direction = "both")
print(df)

输出：

   col1  col2  col3  col4
0  -7.0   1.0   2.0   2.0
1  -3.0   1.0   2.0   3.0
2   1.0   2.0   3.0   4.0
3   5.0   5.0   4.0   5.0
4   9.0   8.0   5.0   6.0
5  13.0   8.8   5.6   7.0

修改
大熊猫可能有更优雅的解决方案，但这是解决问题的一种方法：

d = {'col1 Mar': [np.nan, np.nan, 1], 'col2 Jun': [1, 1, 2], 'col3 Sep': [2, 2, 3], 'col4 Dec': [np.nan, 3, np.nan]} df = pd.DataFrame(data=d) print(df) #store temporarily the column index col_index = df.columns #transcribe month into a number that reflects the time distance df.columns = [3, 6, 9, 12] #interpolate over rows df = df.interpolate(method = "spline", order = 1, limit_direction = "both", axis = 1, downcast = "infer") #assign back the original index df.columns = col_index print(df)

输出：

col1 Mar col2 Jun col3 Sep col4 Dec 0 NaN 1 2 NaN 1 NaN 1 2 3.0 2 1.0 2 3 NaN col1 Mar col2 Jun col3 Sep col4 Dec 0 0 1 2 3 1 0 1 2 3 2 1 2 3 4

如果将列索引作为日期时间对象提供，则可能直接使用列索引，但我不确定。

编辑2： 正如所料，您还可以使用datetime对象作为列名来进行插值：

CSV文件

Mar 2014, Jun 2014, Sep 2014, Mar 2015 nan, 1, 2, nan nan, 1, 2, 4 1, 2, 3, nan

代码：

#read CSV file df = pd.read_csv("test.txt", sep = r',\s*') #convert column names to datetime objects df.columns = pd.to_datetime(df.columns) #interpolate over rows df = df.interpolate(method = "spline", order = 1, limit_direction = "both", axis = 1, downcast = "infer") print(df)

输出：

2014-03-01 2014-06-01 2014-09-01 2015-03-01 0 0.000000 1.0 2.0 3.967391 1 -0.016457 1.0 2.0 4.000000 2 1.000000 2.0 3.0 4.967391

结果现在不再好了，圆整数，因为三个月的天数不同。

外推数据帧行

1 个答案: