我有一个这样的数据框:
col prev_col prev_table prev_tol table count
C_ShipPostalCode C_PostalCode T_customers 0 T_orders 2
C_ShipPostalCode C_PostalCode T_customers 2 T_orders 2
C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1
C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1
C_ShipPostalCode C_PostalCode T_customers 0 T_orders 2
C_ShipPostalCode C_PostalCode T_customers 0 T_orders 1
C_ShipPostalCode C_PostalCode T_customers 1 T_orders 1
C_SupplierID C_UnitPrice T_products 1 T_suppliers 3
C_SupplierID C_UnitPrice T_products 2 T_suppliers 2
我想将此数据框转换为如下所示:
col prev_col prev_table表总计prev_tol计数
C_ShipPostalCode C_PostalCode T_customers T_orders 6 3 10
C_SupplierID C_UnitPrice T_products T_suppliers 2 3 5
如您所见,我想按4列分组,但也想从原始数据框中添加prev_tol和count。
答案 0 :(得分:1)
试试吧:
df_out = df.groupby(['col','prev_col','prev_table']).agg({'prev_tol':'sum','table':['count','max'],'count':'sum'}).reset_index()
df_out.columns = df_out.columns.map('_'.join)
print(df_out)
输出:
col_ prev_col_ prev_table_ prev_tol_sum count_sum \
0 C_ShipPostalCode C_PostalCode T_customers 3 10
1 C_SupplierID C_UnitPrice T_products 3 5
table_count table_max
0 7 T_orders
1 2 T_suppliers