我正在对分组数据框运行回归,如下所示:
import pandas as pd
from pandas.stats.api import ols
df=pd.read_csv(r'C:\path_to_file.csv') #path to original file
#groupby POINTID
list1=[]
for i, grp in df.groupby('POINTID'):
result = ols(y=grp['Date'], x=grp['SWIR32']) #run regression
#turn regression paramaters to a dataframe
frame=pd.DataFrame({'POINTID':i, 'R2': result.r2, 'pvalue': result.p_value[1], 'rmse': result.rmse})
list1.append(frame)
final_frame=pd.concat(list1)
但这会返回:
ValueError: If using all scalar values, you must pass an index
当我将数据框创建行更改为:
frame=pd.DataFrame({'R2': result.r2, 'pvalue': result.p_value[1] , 'rmse': result.rmse}, index=i)
返回:
TypeError: len() of unsized object
基本上我只想将POINTID
,r2,RMSE和p值保存到一个数据帧。
答案 0 :(得分:1)
使用pd.Series
代替
import pandas as pd
from pandas.stats.api import ols
df=pd.read_csv(r'C:\path_to_file.csv') #path to original file
#groupby POINTID
list1=[]
for i, grp in df.groupby('POINTID'):
result = ols(y=grp['Date'], x=grp['SWIR32']) #run regression
#turn regression paramaters to a dataframe
frame=pd.Series({'POINTID':i, 'R2': result.r2, 'pvalue': result.p_value[1], 'rmse': result.rmse})
list1.append(frame)
final_frame=pd.concat(list1, axis=1).T