我想使用pandas和groupby运行OLS回归。
我正在尝试以下代码:
import pandas as pd
from pandas.stats.api import ols
df=pd.read_csv(r'F:\File.csv')
result=df.groupby(['FID']).apply(lambda x: ols(y=df[x['MEAN']], x=df[x['Accum_Prcp'],x['Accum_HDD']]))
print result
但这会返回:
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1150, in _convert_to_indexer
raise KeyError('%s not in index' % objarr[mask])
KeyError: '[ 0.84978328 0.72115778 0.53965104 0.52955655 0.73372541 0.64617074\n 0.60040938 0.7147218 0.65533535 0.57980322 0.57382068 0.56543435\n 0.70740831 0.9245337 0.54859569 0.6789395 0.7086157 0.3835853\n 0.54924104 0.80813778 0.83758118 0.22673391 0.26594087 0.63650468\n 0.89889911 0.38324657 0.30235986 0.62922678 0.55219822 0.55950705\n 0.71137557 0.53631811 0.70158798 0.87116361 0.93751381 0.91125518\n 0.80020908 0.75301262 0.82391046 0.77483673 0.63069573 0.44954455\n 0.83578862 0.56338649 0.64236039 0.93270243 0.93077291 0.83847668\n 0.8268959 0.85400317 0.74319769 0.94803537 0.97484929 0.45366017\n 0.80823694 0.82028051 0.63960395 0.63015722 0.73132888 0.55570184\n 0.83265402 0.75009687 0.58207032 0.92064804 0.91058008 0.86726397\n 0.89204098 0.95573514 0.75704367 0.80786363 0.87448548 0.7553715\n 0.88965962 0.82828493 0.82423891 0.81034742 0.90104876 0.78875473\n 0.97369268] not in index'
我的语法有什么不对吗?
没有groupby这样做会是这样的:
result = ols(y=df['MEAN'], x=df[['Accum_HDD','Accum_Prcp']])
并且工作正常。
我的数据框看起来像这样:
FID Image_Date MEAN Accum_Prcp Accum_HDD
1 19920506 2.0 500.0 1000.0
1 19930506 1.7 450.0 1050.0
2 19920506 2.7 456.0 992.0
2 19930506 1.9 376.0 800.0
答案 0 :(得分:1)
尝试:
grps=df.groupby(['FID'])
for fid, grp in grps:
ols(y=grp.loc[:, 'MEAN'], x=grp.loc[:, ['Accum_Prcp', 'Accum_HDD']])