我有一个带有两个ID,一个计数和一个平均值的熊猫数据框。如何将两个ID分组并得到加权平均值,以便获得以下数据集:
id1 id2 count average
Person A class 1 200 0.2
Person A class 1 400 0.4
Person B class 2 800 0.6
Person C class 2 200 0.4
Person B class 3 800 0.6
Person A class 4 400 0.2
Person B class 2 100 0.5
获得以下结果(以任何行顺序):
id1 id2 count average
Person A class 1 600 0.33
Person B class 2 900 0.59
Person C class 2 200 0.4
Person B class 3 800 0.6
Person A class 4 400 0.2
供参考:
pd.DataFrame({"id1" : ["Person A","Person A","Person B","Person C","Person B","Person A","Person B"],
"id2" : ["class 1","class 1","class 2","class 2","class 3","class 4","class 2"],
"count" : [200, 400, 800, 200, 800, 400, 100],
"average" : [0.2, 0.4, 0.6, 0.4, 0.6, 0.2, 0.5]})
答案 0 :(得分:2)
使用GroupBy.sum
和GroupBy.apply
:
df['average'] = df['count'].mul(df['average'])
grps = df.groupby(['id1', 'id2'], sort=False)
g1 = grps['count'].sum()
g2 = grps.apply(lambda x: x['average'].sum() / x['count'].sum())
dfn = pd.concat([g1, g2.rename('average').round(2)], axis=1).reset_index()
id1 id2 count average
0 Person A class 1 600 0.33
1 Person B class 2 900 0.59
2 Person C class 2 200 0.40
3 Person B class 3 800 0.60
4 Person A class 4 400 0.20
答案 1 :(得分:2)
df.groupby(['id1','id2']).apply(lambda x: np.average(x.average, weights = x.countx))
更改count
列的名称作为其方法。
答案 2 :(得分:1)
您可以先创建平均值列,然后再分组
df.assign(average=lambda x: x['count'].mul(x['average'])).groupby(['id1', 'id2']).sum().assign(average=lambda x: x['average'] / x['count']).reset_index()
id1 id2 count average
0 Person A class 1 600 0.333333
1 Person A class 4 400 0.200000
2 Person B class 2 900 0.588889
3 Person B class 3 800 0.600000
4 Person C class 2 200 0.400000