我有一个类似的DataFrame:
SK_ID_CURR CREDIT_ACTIVE CREDIT_DAY_OVERDUE
436084 Sold 0
436084 Active 951
436084 Sold 0
436084 Active 0
436084 Bad debt 0
436084 Active 936
436084 Active 951
我想为每个CREDIT_ACTIVE类别添加新列,并带有相应的CREDIT_DAY_OVERDUE值。
结果应类似于:
SK_ID_CURR CREDIT_ACTIVE_OD CREDIT_BAD_DEBT_OD CREDIT_ACTIVE_SOLD_OD
436084 2838 0 0
答案 0 :(得分:4)
df = (df.groupby(['SK_ID_CURR','CREDIT_ACTIVE'])['CREDIT_DAY_OVERDUE']
.sum()
.unstack(fill_value=0))
或使用pivot_table
:
df = df.pivot_table(index='SK_ID_CURR',
columns='CREDIT_ACTIVE',
values='CREDIT_DAY_OVERDUE',
aggfunc='sum',
fill_value=0)
然后更改列名称:
df.columns = ['CREDIT_{}_OD'.format(x.upper()) for x in df.columns]
最后从索引创建列:
df = df.reset_index()
print (df)
SK_ID_CURR CREDIT_ACTIVE_OD CREDIT_BAD DEBT_OD CREDIT_SOLD_OD
0 436084 2838 0 0
答案 1 :(得分:0)
res = pd.pivot_table(df, index='SK_ID_CURR', columns='CREDIT_ACTIVE',
values='CREDIT_DAY_OVERDUE', aggfunc='sum')
print(res)
CREDIT_ACTIVE Active BadDebt Sold
SK_ID_CURR
436084 2838 0 0