有数据集之类的。
TransactionId UserId transaction_date transaction_status amount_USD 0 3996625673 1298122 2015-08-11 CHARGED 10,96 1 5797849338 1125916 2015-08-11 DECLINED 14,7 2 9535361884 8009005 2015-08-11 CHARGED 10,61 3 8410989235 1123856 2015-07-29 DECLINED 10,96
在transaction_date的情况下,需要按列amount_usd求和, transaction_status
transaction_date CHARGED DECLINED 2015-07-29 0 10,96 2015-08-11 21,57 14,7
试图像
那样做df[df['transaction_status']=='DECLINED']['amount_USD'].groupby('transaction_date').sum()
答案 0 :(得分:3)
首先使用replace
获取数字,然后使用汇总sum
使用groupby
,然后按unstack
重新塑造:
#or use parameter decimal=',' to read_csv
df['amount_USD'] = df['amount_USD'].replace(',','.', regex=True).astype(float)
df = df.groupby(['transaction_date','transaction_status'])['amount_USD']
.sum()
.unstack(fill_value=0)
print (df)
transaction_status CHARGED DECLINED
transaction_date
2015-07-29 0.00 10.96
2015-08-11 21.57 14.70
替代pivot_table
,谢谢Bharath shetty:
df = df.pivot_table(index='transaction_date',
columns='transaction_status',
values='amount_USD',
aggfunc='sum',
fill_value=0)
print (df)
transaction_status CHARGED DECLINED
transaction_date
2015-07-29 0.00 10.96
2015-08-11 21.57 14.70
上次使用索引的列reset_index
和rename_axis
:
df = df.reset_index().rename_axis(None, axis=1)
print (df)
transaction_date CHARGED DECLINED
0 2015-07-29 0.00 10.96
1 2015-08-11 21.57 14.70