根据类别汇总值

时间:2018-08-02 09:34:35

标签: python pandas aggregate categories

我有一个类似的DataFrame:

SK_ID_CURR  CREDIT_ACTIVE   CREDIT_DAY_OVERDUE

436084         Sold               0

436084         Active            951

436084         Sold               0

436084         Active             0

436084         Bad debt           0

436084         Active            936

436084         Active            951

我想为每个CREDIT_ACTIVE类别添加新列,并带有相应的CREDIT_DAY_OVERDUE值。

结果应类似于:

SK_ID_CURR  CREDIT_ACTIVE_OD  CREDIT_BAD_DEBT_OD CREDIT_ACTIVE_SOLD_OD

436084       2838                 0                 0

2 个答案:

答案 0 :(得分:4)

使用groupby并汇总sum,最后由unstack重塑:

df = (df.groupby(['SK_ID_CURR','CREDIT_ACTIVE'])['CREDIT_DAY_OVERDUE']
        .sum()
        .unstack(fill_value=0))

或使用pivot_table

df = df.pivot_table(index='SK_ID_CURR',
                    columns='CREDIT_ACTIVE',
                    values='CREDIT_DAY_OVERDUE',
                    aggfunc='sum',
                    fill_value=0)

然后更改列名称:

df.columns = ['CREDIT_{}_OD'.format(x.upper()) for x in df.columns]

最后从索引创建列:

df = df.reset_index()
print (df)
   SK_ID_CURR  CREDIT_ACTIVE_OD  CREDIT_BAD DEBT_OD  CREDIT_SOLD_OD
0      436084              2838                   0               0

答案 1 :(得分:0)

使用pd.pivot_table

res = pd.pivot_table(df, index='SK_ID_CURR', columns='CREDIT_ACTIVE',
                     values='CREDIT_DAY_OVERDUE', aggfunc='sum')

print(res)

CREDIT_ACTIVE  Active  BadDebt  Sold
SK_ID_CURR                          
436084           2838        0     0