我已经总结了 col1 col2 col3 count 的数据框,在那个 count 上添加了不同的权重
数据集就像
# Current result
col1 col2 col3 Count Weightage_count
---------------------------------------------
1: A S1 X110 2 2
2: A S1 X150 2 0.5
3: A S2 X212 2 1
4: A S2 X200 1 0.5
5: A S2 X211 1 0.25
6: B S3 X311 4 4
7: C S4 X222 3 1.5
data = {'Col1':['A','A','A','A','A','B','C'],
'Col2':['S1','S1','S2','S2','S2','S3','S4'],
'Col3':['X110','X150','X212','X200','X211','X311','X222'],
'Count': [2,2,2,1,1,4,3],
'Weightage_count':[2, 0.5, 1, 0.5, 0.25, 4, 1.5]}
df = pd.DataFrame(data)
想根据 col1 和 col2 计算结果。
预期结果。
Col1 Col2 Result
-------------------
1 A S1 0.625
2 A S2 0.5
3 B S3 1
4 C S4 0.5
答案 0 :(得分:2)
首先聚合 sum
,然后聚合 DataFrame.eval
中的多列:
df = (df.groupby(['Col1','Col2'])
.sum()
.eval('Weightage_count / Count')
.reset_index(name='Result'))
print (df)
Col1 Col2 Result
0 A S1 0.6250
1 A S2 0.4375
2 B S3 1.0000
3 C S4 0.5000
或除以 Series.div
和 DataFrame.pop
以在处理后删除列:
df = df.groupby(['Col1','Col2'], as_index=False)[['Count','Weightage_count']].sum()
df['new'] = df.pop('Weightage_count').div(df.pop('Count'))
print (df)
Col1 Col2 new
0 A S1 0.6250
1 A S2 0.4375
2 B S3 1.0000
3 C S4 0.5000
如果还需要列:
df = df.groupby(['Col1','Col2'])[['Count','Weightage_count']].sum()
df['new'] = df['Weightage_count'].div(df['Count'])
print (df)
Count Weightage_count new
Col1 Col2
A S1 4 2.50 0.6250
S2 4 1.75 0.4375
B S3 4 4.00 1.0000
C S4 3 1.50 0.5000
答案 1 :(得分:1)
使用Groupby.agg
:
In [438]: x = df.groupby(['Col1', 'Col2']).agg({'Weightage_count': 'sum', 'Count': 'sum'})
In [439]: x['Result'] = x.Weightage_count/x.Count
In [440]: x
Out[440]:
Weightage_count Count Result
Col1 Col2
A S1 2.50 4 0.6250
S2 1.75 4 0.4375
B S3 4.00 4 1.0000
C S4 1.50 3 0.5000
答案 2 :(得分:1)
您也可以使用 pipe :
In [4]: group = df.groupby(['Col1', 'Col2'])
In [5]: group.pipe(lambda df: df.Weightage_count.sum()/df.Count.sum())
Out[5]:
Col1 Col2
A S1 0.6250
S2 0.4375
B S3 1.0000
C S4 0.5000
dtype: float64
如果要包含名称,可以使用 rename
方法:
In [13]: group.pipe(lambda df: df.Weightage_count.sum()/df.Count.sum()).rename('Result').reset_index()
Out[13]:
Col1 Col2 Result
0 A S1 0.6250
1 A S2 0.4375
2 B S3 1.0000
3 C S4 0.5000