在pandas行/回归线中应用公式

时间:2017-12-04 14:07:53

标签: python pandas scipy

我正在尝试跨数据框的行应用公式以获取行中数字的趋势。

以下示例适用于使用.apply的部分。

df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
axisvalues=list(range(1,len(db.columns)+1))

def calc_slope(row):
    return scipy.stats.linregress(df.iloc[row,:], y=axisvalues)

calc_slope(1) # this works

df["New"]=df.apply(calc_slope,axis=1) # this fails *- "too many values to unpack"*

感谢您的帮助

1 个答案:

答案 0 :(得分:2)

我认为您需要一个属性:

def calc_slope(row):
    a = scipy.stats.linregress(row, y=axisvalues)
    return a.slope 

df["slope"]=df.apply(calc_slope,axis=1)
print (df)
          A         B         C         D     slope
0  0.444640  0.024624 -0.016216  0.228935 -2.553465
1  1.226611  1.962481  1.103834  0.645562 -1.455239
2 -0.259415  0.971097  0.124538 -0.704115 -0.718621
3  1.938422  1.787310 -0.619745 -2.560187 -0.575519
4 -0.986231 -1.942930  2.677379 -1.813071  0.075679
5  0.611214 -0.258453  0.053452  1.223544  0.841865
6  0.685435  0.962880 -1.517077 -0.101108 -0.652503
7  0.368278  1.314202  0.748189  2.116189  1.350132
8 -0.322053 -1.135443 -0.161071 -1.836761 -0.987341
9  0.798461  0.461736 -0.665127 -0.247887 -1.610447

对于所有属性,将名为tuple的元组转换为dict,然后转换为Series。输出为新DataFrame,因此如果必要join为原始版本:

np.random.seed(1997)

df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
axisvalues=list(range(1,len(df.columns)+1))

def calc_slope(row):
    a = scipy.stats.linregress(row, y=axisvalues)
    return pd.Series(a._asdict())

print (df.apply(calc_slope,axis=1))
      slope  intercept    rvalue    pvalue    stderr
0 -2.553465   2.935355 -0.419126  0.580874  3.911302
1 -1.455239   4.296670 -0.615324  0.384676  1.318236
2 -0.718621   2.523733 -0.395862  0.604138  1.178774
3 -0.575519   2.578530 -0.956682  0.043318  0.123843
4  0.075679   2.539066  0.127254  0.872746  0.417101
5  0.841865   2.156991  0.425333  0.574667  1.266674
6 -0.652503   2.504915 -0.561947  0.438053  0.679154
7  1.350132   0.965285  0.794704  0.205296  0.729193
8 -0.987341   1.647104 -0.593680  0.406320  0.946311
9 -1.610447   2.639780 -0.828856  0.171144  0.768641
df = df.join(df.apply(calc_slope,axis=1))
print (df)
          A         B         C         D     slope  intercept    rvalue  \
0  0.444640  0.024624 -0.016216  0.228935 -2.553465   2.935355 -0.419126   
1  1.226611  1.962481  1.103834  0.645562 -1.455239   4.296670 -0.615324   
2 -0.259415  0.971097  0.124538 -0.704115 -0.718621   2.523733 -0.395862   
3  1.938422  1.787310 -0.619745 -2.560187 -0.575519   2.578530 -0.956682   
4 -0.986231 -1.942930  2.677379 -1.813071  0.075679   2.539066  0.127254   
5  0.611214 -0.258453  0.053452  1.223544  0.841865   2.156991  0.425333   
6  0.685435  0.962880 -1.517077 -0.101108 -0.652503   2.504915 -0.561947   
7  0.368278  1.314202  0.748189  2.116189  1.350132   0.965285  0.794704   
8 -0.322053 -1.135443 -0.161071 -1.836761 -0.987341   1.647104 -0.593680   
9  0.798461  0.461736 -0.665127 -0.247887 -1.610447   2.639780 -0.828856   

     pvalue    stderr  
0  0.580874  3.911302  
1  0.384676  1.318236  
2  0.604138  1.178774  
3  0.043318  0.123843  
4  0.872746  0.417101  
5  0.574667  1.266674  
6  0.438053  0.679154  
7  0.205296  0.729193  
8  0.406320  0.946311  
9  0.171144  0.768641