获取按日期按相同列分组的总和,按2个条件筛选

时间:2017-10-20 10:43:36

标签: python pandas

有数据集之类的。

    TransactionId   UserId transaction_date transaction_status amount_USD
0       3996625673  1298122       2015-08-11            CHARGED      10,96
1       5797849338  1125916       2015-08-11           DECLINED       14,7
2       9535361884  8009005       2015-08-11            CHARGED      10,61
3       8410989235  1123856       2015-07-29           DECLINED      10,96

在transaction_date的情况下,需要按列amount_usd求和, transaction_status

transaction_date    CHARGED DECLINED
2015-07-29             0     10,96
2015-08-11           21,57   14,7

试图像

那样做
df[df['transaction_status']=='DECLINED']['amount_USD'].groupby('transaction_date').sum()

1 个答案:

答案 0 :(得分:3)

首先使用replace获取数字,然后使用汇总sum使用groupby,然后按unstack重新塑造:

#or use parameter decimal=',' to read_csv
df['amount_USD'] = df['amount_USD'].replace(',','.', regex=True).astype(float)

df = df.groupby(['transaction_date','transaction_status'])['amount_USD']
       .sum()
       .unstack(fill_value=0)
print (df)
transaction_status  CHARGED  DECLINED
transaction_date                     
2015-07-29             0.00     10.96
2015-08-11            21.57     14.70

替代pivot_table,谢谢Bharath shetty

df = df.pivot_table(index='transaction_date',
                    columns='transaction_status', 
                    values='amount_USD', 
                    aggfunc='sum', 
                    fill_value=0)
print (df)

transaction_status  CHARGED  DECLINED
transaction_date                     
2015-07-29             0.00     10.96
2015-08-11            21.57     14.70

上次使用索引的列reset_indexrename_axis

df = df.reset_index().rename_axis(None, axis=1)
print (df)
  transaction_date  CHARGED  DECLINED
0       2015-07-29     0.00     10.96
1       2015-08-11    21.57     14.70