我有这样的df:
Allotment Year NDVI A_Annex Bachelor
A_Annex 1984 1.0 0.40 0.60
A_Annex 1984 1.5 0.56 0.89
A_Annex 1984 2.0 0.78 0.76
A_Annex 1985 3.4 0.89 0.54
A_Annex 1985 1.6 0.98 0.66
A_Annex 1986 2.5 1.10 0.44
A_Annex 1986 1.7 0.87 0.65
Bachelor 1984 8.9 0.40 0.60
Bachelor 1984 6.5 0.56 0.89
Bachelor 1984 4.2 0.78 0.76
Bachelor 1985 2.4 0.89 0.54
Bachelor 1985 1.7 0.98 0.66
Bachelor 1986 8.9 1.10 0.44
Bachelor 1986 9.6 0.87 0.65
我想基于groupby运行回归。我想要回归每个唯一的Allotment
及其NDVI
值及其关联列。因此,我想使用A_Annex
Allotment
及其关联的A_Annex
对专栏NDVI
进行回归。然后我想用Bachelor
做同样的事情。基本上我希望将列与关联的Allotment
匹配,然后使用相应的NDVI
值对列中的值进行回归。
我可以为这样的一个分配做到这一点:
stat=merge.groupby(['Allotment']).apply(lambda x: sp.stats.linregress(x['A_Annex'], x['NDVI']))
但是我需要继续更改sp.stats.linregress(x['A_Annex'], x['NDVI']))
中的x值,我想避免这种情况。
答案 0 :(得分:1)
你是否经历过这样的事情?
r = {annex: pd.ols(x=group['A_Annex'], y=group['NDVI'])
for annex, group in df.groupby('Allotment')}
>>> r
{'A_Annex':
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 7
Number of Degrees of Freedom: 2
R-squared: 0.3774
Adj R-squared: 0.2529
Rmse: 0.6785
F-stat (1, 5): 3.0307, p-value: 0.1422
Degrees of Freedom: model 1, resid 5
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 1.9871 1.1415 1.74 0.1422 -0.2501 4.2244
intercept 0.3731 0.9454 0.39 0.7094 -1.4798 2.2260
---------------------------------End of Summary---------------------------------,
'Bachelor':
-------------------------Summary of Regression Analysis-------------------------
Formula: Y ~ <x> + <intercept>
Number of Observations: 7
Number of Degrees of Freedom: 2
R-squared: 0.0650
Adj R-squared: -0.1220
Rmse: 3.4787
F-stat (1, 5): 0.3478, p-value: 0.5810
Degrees of Freedom: model 1, resid 5
-----------------------Summary of Estimated Coefficients------------------------
Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x -3.4511 5.8522 -0.59 0.5810 -14.9213 8.0191
intercept 8.7796 4.8467 1.81 0.1298 -0.7200 18.2792
---------------------------------End of Summary---------------------------------}
然后您可以按如下方式提取模型参数:
>>> {k: r[k].sm_ols.params for k in r}
{'A_Annex': array([ 1.9871432 , 0.37310585]),
'Bachelor': array([-3.45111992, 8.77960702])}