聚合数据框架Python

时间:2017-06-09 14:56:15

标签: python pandas aggregate

我有一个这样的数据框:

  

col prev_col prev_table prev_tol table count

     

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 2

     

C_ShipPostalCode C_PostalCode T_customers 2 T_orders 2

     

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1

     

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1

     

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 2

     

C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1

     

C_ShipPostalCode C_PostalCode T_customers 1 T_orders 1

     

C_SupplierID C_UnitPrice T_products 1 T_suppliers 3

     

C_SupplierID C_UnitPrice T_products 2 T_suppliers 2

我想将此数据框转换为如下所示:

  

col prev_col prev_table表总计prev_tol计数

     

C_ShipPostalCode C_PostalCode T_customers T_orders 6 3 10

     

C_SupplierID C_UnitPrice T_products T_suppliers 2 3 5

如您所见,我想按4列分组,但也想从原始数据框中添加prev_tol和count。

1 个答案:

答案 0 :(得分:1)

试试吧:

df_out = df.groupby(['col','prev_col','prev_table']).agg({'prev_tol':'sum','table':['count','max'],'count':'sum'}).reset_index()
df_out.columns = df_out.columns.map('_'.join)
print(df_out)

输出:

               col_     prev_col_  prev_table_  prev_tol_sum  count_sum  \
0  C_ShipPostalCode  C_PostalCode  T_customers             3         10   
1      C_SupplierID   C_UnitPrice   T_products             3          5   

   table_count    table_max  
0            7     T_orders  
1            2  T_suppliers