使用groupby

时间:2016-03-30 23:19:45

标签: python pandas statsmodels

我想使用pandas和groupby运行OLS回归。

我正在尝试以下代码:

import pandas as pd
from pandas.stats.api import ols

df=pd.read_csv(r'F:\File.csv')
result=df.groupby(['FID']).apply(lambda x: ols(y=df[x['MEAN']], x=df[x['Accum_Prcp'],x['Accum_HDD']]))
print result

但这会返回:

File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\indexing.py", line 1150, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])

    KeyError: '[ 0.84978328  0.72115778  0.53965104  0.52955655  0.73372541  0.64617074\n  0.60040938  0.7147218   0.65533535  0.57980322  0.57382068  0.56543435\n  0.70740831  0.9245337   0.54859569  0.6789395   0.7086157   0.3835853\n  0.54924104  0.80813778  0.83758118  0.22673391  0.26594087  0.63650468\n  0.89889911  0.38324657  0.30235986  0.62922678  0.55219822  0.55950705\n  0.71137557  0.53631811  0.70158798  0.87116361  0.93751381  0.91125518\n  0.80020908  0.75301262  0.82391046  0.77483673  0.63069573  0.44954455\n  0.83578862  0.56338649  0.64236039  0.93270243  0.93077291  0.83847668\n  0.8268959   0.85400317  0.74319769  0.94803537  0.97484929  0.45366017\n  0.80823694  0.82028051  0.63960395  0.63015722  0.73132888  0.55570184\n  0.83265402  0.75009687  0.58207032  0.92064804  0.91058008  0.86726397\n  0.89204098  0.95573514  0.75704367  0.80786363  0.87448548  0.7553715\n  0.88965962  0.82828493  0.82423891  0.81034742  0.90104876  0.78875473\n  0.97369268] not in index'

我的语法有什么不对吗?

没有groupby这样做会是这样的:

result = ols(y=df['MEAN'], x=df[['Accum_HDD','Accum_Prcp']])

并且工作正常。

我的数据框看起来像这样:

FID  Image_Date   MEAN  Accum_Prcp   Accum_HDD
1     19920506     2.0   500.0        1000.0
1     19930506     1.7   450.0        1050.0
2     19920506     2.7   456.0        992.0
2     19930506     1.9   376.0        800.0 

1 个答案:

答案 0 :(得分:1)

尝试:

grps=df.groupby(['FID'])
for fid, grp in grps:
    ols(y=grp.loc[:, 'MEAN'], x=grp.loc[:, ['Accum_Prcp', 'Accum_HDD']])