计算pandas DataFrame groupby Columns的最小最大平均中位数,并加入结果

时间:2019-08-06 00:23:56

标签: python python-3.x pandas

我有一个pandas DataFrame,我想在一列上执行min,max,mean,median计算,使用A,B和C列对它们进行分组。 然后,我想将结果合并到初始DataFrame。 计算中位数时,使用波纹管可以取得成功:

pandas_df: pd.DataFrame = my_pandas_sql.pull_data_from_mysqldb(query=sql_string)
median_px = pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent'])[['Px/SQM']].apply(np.median)
median_px.name = 'Median Px/SQM'
result_median_df = pandas_df.join(median_px, on=['ZIP', 'Updated', 'Buy/Rent'], how="left")
result_median_df.to_csv(path_or_buf='median.csv')

但是当我尝试计算最小和最大并将其添加到DataFrame时,出现以下错误:

ValueError:列重叠但未指定后缀:Index(['Px / SQM'],dtype ='object')

用于最小或最大的代码:

pandas_df: pd.DataFrame = my_pandas_sql.pull_data_from_mysqldb(query=sql_string)
min_px = pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent'])[['Px/SQM']].apply(np.min)
min_px.name = 'Min Px/SQM'
result_min_df = pandas_df.join(min_px, on=['ZIP', 'Updated', 'Buy/Rent'], how="left")
result_min_df.to_csv(path_or_buf='min_px.csv')

我尝试使用sufixe,它可以工作,但是我想使用自己的列全名。还是我必须在使用后重命名?

同样,我相信有一种方法可以将请求作为数组发送:[np.min,np.mean,np.median,np.max],使用agg重命名列,但我无法可以。

pandas_df: pd.DataFrame = my_pandas_sql.pull_data_from_mysqldb(query=sql_string)
min_px = pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent'])[['Px/SQM']].apply(np.min)
min_px.name = 'Min Px/SQM'
result_min_df = pandas_df.join(min_px, on=['ZIP', 'Updated', 'Buy/Rent'], how="left", lsuffix="_min")
result_min_df.to_csv(path_or_buf='min_px.csv')



After having received great answer, just a comment.
I was trying to use the bellow which was triggering a lot of warnings and was slower than the solution proposed:

df1=pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent']).agg({'Px/SQM':                                                                   {'Min': np.min,'Max': np.max,'Mean': np.mean,'Median': np.median                                                                  }} ).reset_index()df3= pd.merge(pandas_df, df1, on=['ZIP', 'Updated', 'Buy/Rent'], how='left')

1 个答案:

答案 0 :(得分:0)

当您需要向原始dfs中添加columns时,可以随时使用transform

g=pandas_df.groupby(['ZIP', 'Updated', 'Buy/Rent'])['Px/SQM']

pandas_df['Max']=g.transform('max')
pandas_df['Min']=g.transform('min')
pandas_df['Median']=g.transform(np.median)
pandas_df['Mean']=g.transform('mean')