我有一个如下的df“数据”
Name Quality city
Tom High A
nick Medium B
krish Low A
Jack High A
Kevin High B
Phil Medium B
我想按城市对其进行分组,并根据“质量”列创建一个新列,并如下计算avegare
city High Medium Low High_Avg Medium_AVG Low_avg
A 2 0 1 66.66 0 33.33
B 1 1 0 50 50 0
我尝试使用以下脚本,但我知道这是完全错误的。 data_average = data_df.groupby(['city'],as_index = False).count()
答案 0 :(得分:1)
获取频率计数,将结果除以各列之和,最后将datframe连接成一个:
result = pd.crosstab(df.city, df.Quality)
averages = result.div(result.sum(1).array, axis=0).mul(100).round(2).add_suffix("_Avg")
#combine the dataframes
pd.concat((result, averages), axis=1)
Quality High Low Medium High_Avg Low_Avg Medium_Avg
city
A 2 1 0 66.67 33.33 0.00
B 1 0 2 33.33 0.00 66.67