保存pandas ols回归到数据帧的结果

时间:2016-10-14 13:29:30

标签: python-2.7 pandas

我正在对分组数据框运行回归,如下所示:

import pandas as pd
from pandas.stats.api import ols

df=pd.read_csv(r'C:\path_to_file.csv') #path to original file

#groupby POINTID
list1=[]
for i, grp in df.groupby('POINTID'):
    result = ols(y=grp['Date'], x=grp['SWIR32']) #run regression
    #turn regression paramaters to a dataframe
    frame=pd.DataFrame({'POINTID':i, 'R2': result.r2, 'pvalue': result.p_value[1], 'rmse': result.rmse})
    list1.append(frame)
final_frame=pd.concat(list1)

但这会返回:

ValueError: If using all scalar values, you must pass an index

当我将数据框创建行更改为:

frame=pd.DataFrame({'R2': result.r2, 'pvalue': result.p_value[1] , 'rmse': result.rmse}, index=i)

返回:

TypeError: len() of unsized object

基本上我只想将POINTID,r2,RMSE和p值保存到一个数据帧。

1 个答案:

答案 0 :(得分:1)

使用pd.Series代替

import pandas as pd
from pandas.stats.api import ols

df=pd.read_csv(r'C:\path_to_file.csv') #path to original file

#groupby POINTID
list1=[]
for i, grp in df.groupby('POINTID'):
    result = ols(y=grp['Date'], x=grp['SWIR32']) #run regression
    #turn regression paramaters to a dataframe
    frame=pd.Series({'POINTID':i, 'R2': result.r2, 'pvalue': result.p_value[1], 'rmse': result.rmse})
    list1.append(frame)
final_frame=pd.concat(list1, axis=1).T