我有一个带有x和y坐标列表的数据框。我正在尝试在其上运行统计线性回归函数,但整个过程都让我难忘。
数据框看起来像这样
x1 x2 x3 x4 y1 y2 y3 y4
0 6 5 4 1 2 3 7 6
1 5 5 4 9 4 3 8 2
我的代码如下:
#slope,_,_,_,_=stats.linregress([-7,55,12,-38],[5,40,-10,-20]) #tested:works
df.loc[:,'slope1'] = df[['x1','x2','y1','y2']].apply(lambda x: stats.linregress([x[0],x[1]],[x[2],x[3]])[0])
df.loc[:,'slope2'] = df[['x3','x4','y3','y4']].apply(lambda x: stats.linregress([x[0],x[1]],[x[2],x[3]])[0])
# not working until linregress above works:
#df['angle'] = np.arctan((df['slope1'] - df['slope2']) / (1 + (df['slope1'] * df['slope2'])))
这将产生:
x1 x2 x3 x4 y1 y2 y3 y4 slope1 slope2
0 6 5 4 1 2 3 7 6 NaN NaN
1 5 5 4 9 4 3 8 2 NaN NaN
我应该如何将一个函数应用于数据框列,以使其提供除nan之外的其他功能?
答案 0 :(得分:2)
我认为需要为每行的进程功能定义axis=1
:
from scipy import stats
f = lambda x: stats.linregress([x[0],x[1]],[x[2],x[3]])[0]
df['slope1'] = df[['x1','x2','y1','y2']].apply(f, axis=1)
df['slope2'] = df[['x3','x4','y3','y4']].apply(f, axis=1)
df['angle'] = np.arctan((df['slope1'] - df['slope2']) / (1 + (df['slope1'] * df['slope2'])))
print (df)
x1 x2 x3 x4 y1 y2 y3 y4 slope1 slope2 angle
0 6 5 4 1 2 3 7 6 -1.0 0.333333 -1.107149
1 5 5 4 9 4 3 8 2 NaN -1.200000 NaN