如何在pivot中获取连接值?

时间:2017-10-18 05:01:34

标签: python pandas

我目前正在使用像

这样的数据集
    customerID  store_code  mode
    BBID_204100102  2655    a
    BBID_204100102  2906    b
    BBID_204100102  2906    d 
    BBID_204100150  4986    c      
    BBID_204100150  4986    a
    BBID_204100277  4986    d                                                 
    BBID_204100310  4986    d

我想要像

这样的东西
    customerID  store_code  a   b  c  d
0   BBID_204100102  2655    1   0   0  0
1   BBID_204100102  2906    0   1   0  0
2   BBID_204100150  4986    1   0   1  0 
3   BBID_204100277  4986    0   0   0  1
4   BBID_204100310  4986    0   0   0  1

首先对客户ud和学生ID进行调整,然后以上述方式编码模式。

2 个答案:

答案 0 :(得分:1)

get_dummiesmax两个级别使用set_index

df = (pd.get_dummies(df.set_index(['customerID','store_code']), prefix='', prefix_sep='')
        .max(level=[0,1])
        .reset_index())
print (df)
       customerID  store_code  a  b  c  d
0  BBID_204100102        2655  1  0  0  0
1  BBID_204100102        2906  0  1  0  1
2  BBID_204100150        4986  1  0  1  0
3  BBID_204100277        4986  0  0  0  1
4  BBID_204100310        4986  0  0  0  1

答案 1 :(得分:1)

选项1 使用pivot_table

In [3781]: df.pivot_table(index=['customerID','store_code'], columns='mode',
                          aggfunc=len, fill_value=0).reset_index()
Out[3781]:
mode      customerID  store_code  a  b  c  d
0     BBID_204100102        2655  1  0  0  0
1     BBID_204100102        2906  0  1  0  1
2     BBID_204100150        4986  1  0  1  0
3     BBID_204100277        4986  0  0  0  1
4     BBID_204100310        4986  0  0  0  1

选项2 使用groupby

In [3793]: (df.groupby(['customerID', 'store_code', 'mode']).size()
              .unstack(fill_value=0).reset_index())
Out[3793]:
mode      customerID  store_code  a  b  c  d
0     BBID_204100102        2655  1  0  0  0
1     BBID_204100102        2906  0  1  0  1
2     BBID_204100150        4986  1  0  1  0
3     BBID_204100277        4986  0  0  0  1
4     BBID_204100310        4986  0  0  0  1

选项3 使用set_indexunstack

In [3771]: (df.assign(v=1).set_index(['customerID', 'store_code', 'mode'])['v']
              .unstack(fill_value=0).reset_index())
Out[3771]:
mode      customerID  store_code  a  b  c  d
0     BBID_204100102        2655  1  0  0  0
1     BBID_204100102        2906  0  1  0  1
2     BBID_204100150        4986  1  0  1  0
3     BBID_204100277        4986  0  0  0  1
4     BBID_204100310        4986  0  0  0  1