如何分组和计算其他列。熊猫

时间:2021-03-25 06:51:25

标签: python pandas

我已经总结了 col1 col2 col3 count 的数据框,在那个 count 上添加了不同的权重

数据集就像


# Current result 
    col1 col2  col3   Count   Weightage_count
--------------------------------------------- 
 1:  A    S1   X110     2          2
 2:  A    S1   X150     2          0.5
 3:  A    S2   X212     2          1
 4:  A    S2   X200     1          0.5
 5:  A    S2   X211     1          0.25
 6:  B    S3   X311     4          4
 7:  C    S4   X222     3          1.5


data = {'Col1':['A','A','A','A','A','B','C'],
        'Col2':['S1','S1','S2','S2','S2','S3','S4'],
         'Col3':['X110','X150','X212','X200','X211','X311','X222'],
          'Count': [2,2,2,1,1,4,3],  
           'Weightage_count':[2, 0.5, 1, 0.5, 0.25, 4, 1.5]}

df = pd.DataFrame(data)

想根据 col1 和 col2 计算结果。

  • 结果 =(Col1 和 Col2 的总 Weightage_count)/(Col1 和 Col2 的总计数)

预期结果。

    Col1  Col2  Result
-------------------
1   A     S1     0.625
2   A     S2     0.5
3   B     S3     1 
4   C     S4     0.5

3 个答案:

答案 0 :(得分:2)

首先聚合 sum,然后聚合 DataFrame.eval 中的多列:

df = (df.groupby(['Col1','Col2'])
        .sum()
        .eval('Weightage_count / Count')
        .reset_index(name='Result'))
print (df)
  Col1 Col2  Result
0    A   S1  0.6250
1    A   S2  0.4375
2    B   S3  1.0000
3    C   S4  0.5000

或除以 Series.divDataFrame.pop 以在处理后删除列:

df = df.groupby(['Col1','Col2'], as_index=False)[['Count','Weightage_count']].sum()
df['new'] = df.pop('Weightage_count').div(df.pop('Count'))
print (df)
  Col1 Col2     new
0    A   S1  0.6250
1    A   S2  0.4375
2    B   S3  1.0000
3    C   S4  0.5000

如果还需要列:

df = df.groupby(['Col1','Col2'])[['Count','Weightage_count']].sum()
df['new'] = df['Weightage_count'].div(df['Count'])
print (df)
           Count  Weightage_count     new
Col1 Col2                                
A    S1        4             2.50  0.6250
     S2        4             1.75  0.4375
B    S3        4             4.00  1.0000
C    S4        3             1.50  0.5000

答案 1 :(得分:1)

使用Groupby.agg

In [438]: x = df.groupby(['Col1', 'Col2']).agg({'Weightage_count': 'sum', 'Count': 'sum'})

In [439]: x['Result'] = x.Weightage_count/x.Count

In [440]: x
Out[440]: 
           Weightage_count  Count  Result
Col1 Col2                                
A    S1               2.50      4  0.6250
     S2               1.75      4  0.4375
B    S3               4.00      4  1.0000
C    S4               1.50      3  0.5000

答案 2 :(得分:1)

您也可以使用 pipe

In [4]: group = df.groupby(['Col1', 'Col2'])

In [5]: group.pipe(lambda df: df.Weightage_count.sum()/df.Count.sum())
Out[5]: 
Col1  Col2
A     S1      0.6250
      S2      0.4375
B     S3      1.0000
C     S4      0.5000
dtype: float64

如果要包含名称,可以使用 rename 方法:

In [13]: group.pipe(lambda df: df.Weightage_count.sum()/df.Count.sum()).rename('Result').reset_index()
Out[13]: 
  Col1 Col2  Result
0    A   S1  0.6250
1    A   S2  0.4375
2    B   S3  1.0000
3    C   S4  0.5000