我正在尝试解决此错误;
ValueError: can not merge DataFrame with instance of type <class 'pandas.core.groupby.DataFrameGroupBy'>
我想合并由agg
创建的两个数据框作为;
首先,我从主df创建了一组分组数据;
resi_all_nooutliers_bysector = df_resi_rawdata_nooutliers.groupby(['postcode_sector'])
resi_flats_nooutliers_bysector = df_resi_rawdata_nooutliers.loc[df_resi_rawdata_nooutliers['propertytype']=='F'].groupby(['postcode_sector'])
然后我运行了我想要的统计数据
resi_flats_nooutliers_bysector['updatedprice_calculated'].
agg([np.mean,np.median,np.max,'count'])
resi_all_nooutliers_bysector['updatedprice_calculated'].
agg([np.mean,np.median,np.max,'count'])
然后我尝试合并为;
df_resi_nooutliers_bysector = pd.merge(resi_all_nooutliers_bysector,
resi_flats_nooutliers_bysector,
on=['postcode_sector'],how='left',
suffixes=('_allprop', '_flats'))
获取标题中的错误
答案 0 :(得分:0)
对我来说这很有效,将agg输出保存到数据帧中,确保索引在原始索引上(列postcode_sectors
)
df1 = resi_flats_nooutliers_bysector['updatedprice_calculated'].\
agg([np.mean,np.median,np.max,'count'],as_index=False)
df2 = resi_all_nooutliers_bysector['updatedprice_calculated'].\
agg([np.mean,np.median,np.max,'count'],as_index=False)
type (resi_flats_nooutliers_bysector)
df1.head(10)
然后使用索引进行连接
merge_test = df2.merge(df1, left_index=True, right_index=True,suffixes=
('_allprop', '_flats'))
merge_test.head(10)`