Question

我有一个带有x和y坐标列表的数据框。我正在尝试在其上运行统计线性回归函数，但整个过程都让我难忘。

数据框看起来像这样

  x1  x2  x3  x4  y1  y2  y3  y4 
0 6   5   4   1   2   3   7   6 
1 5   5   4   9   4   3   8   2

我的代码如下：

#slope,_,_,_,_=stats.linregress([-7,55,12,-38],[5,40,-10,-20]) #tested:works 

df.loc[:,'slope1'] = df[['x1','x2','y1','y2']].apply(lambda x: stats.linregress([x[0],x[1]],[x[2],x[3]])[0])
df.loc[:,'slope2'] = df[['x3','x4','y3','y4']].apply(lambda x: stats.linregress([x[0],x[1]],[x[2],x[3]])[0])

# not working until linregress above works:
#df['angle'] = np.arctan((df['slope1'] - df['slope2']) / (1 + (df['slope1'] * df['slope2'])))

这将产生：

  x1  x2  x3  x4  y1  y2  y3  y4  slope1  slope2 
0 6   5   4   1   2   3   7   6   NaN     NaN  
1 5   5   4   9   4   3   8   2   NaN     NaN

我应该如何将一个函数应用于数据框列，以使其提供除nan之外的其他功能？

Answer 1

我认为需要为每行的进程功能定义axis=1：

from scipy import stats

f = lambda x: stats.linregress([x[0],x[1]],[x[2],x[3]])[0]
df['slope1'] = df[['x1','x2','y1','y2']].apply(f, axis=1)
df['slope2'] = df[['x3','x4','y3','y4']].apply(f, axis=1)

df['angle'] = np.arctan((df['slope1'] - df['slope2']) / (1 + (df['slope1'] * df['slope2'])))
print (df)
   x1  x2  x3  x4  y1  y2  y3  y4  slope1    slope2     angle
0   6   5   4   1   2   3   7   6    -1.0  0.333333 -1.107149
1   5   5   4   9   4   3   8   2     NaN -1.200000       NaN

在所有数据框上运行统计线性回归

1 个答案: