我目前正在使用像
这样的数据集 customerID store_code mode
BBID_204100102 2655 a
BBID_204100102 2906 b
BBID_204100102 2906 d
BBID_204100150 4986 c
BBID_204100150 4986 a
BBID_204100277 4986 d
BBID_204100310 4986 d
我想要像
这样的东西 customerID store_code a b c d
0 BBID_204100102 2655 1 0 0 0
1 BBID_204100102 2906 0 1 0 0
2 BBID_204100150 4986 1 0 1 0
3 BBID_204100277 4986 0 0 0 1
4 BBID_204100310 4986 0 0 0 1
首先对客户ud和学生ID进行调整,然后以上述方式编码模式。
答案 0 :(得分:1)
get_dummies
和max
两个级别使用set_index
:
df = (pd.get_dummies(df.set_index(['customerID','store_code']), prefix='', prefix_sep='')
.max(level=[0,1])
.reset_index())
print (df)
customerID store_code a b c d
0 BBID_204100102 2655 1 0 0 0
1 BBID_204100102 2906 0 1 0 1
2 BBID_204100150 4986 1 0 1 0
3 BBID_204100277 4986 0 0 0 1
4 BBID_204100310 4986 0 0 0 1
答案 1 :(得分:1)
选项1 使用pivot_table
In [3781]: df.pivot_table(index=['customerID','store_code'], columns='mode',
aggfunc=len, fill_value=0).reset_index()
Out[3781]:
mode customerID store_code a b c d
0 BBID_204100102 2655 1 0 0 0
1 BBID_204100102 2906 0 1 0 1
2 BBID_204100150 4986 1 0 1 0
3 BBID_204100277 4986 0 0 0 1
4 BBID_204100310 4986 0 0 0 1
选项2 使用groupby
In [3793]: (df.groupby(['customerID', 'store_code', 'mode']).size()
.unstack(fill_value=0).reset_index())
Out[3793]:
mode customerID store_code a b c d
0 BBID_204100102 2655 1 0 0 0
1 BBID_204100102 2906 0 1 0 1
2 BBID_204100150 4986 1 0 1 0
3 BBID_204100277 4986 0 0 0 1
4 BBID_204100310 4986 0 0 0 1
选项3 使用set_index
和unstack
In [3771]: (df.assign(v=1).set_index(['customerID', 'store_code', 'mode'])['v']
.unstack(fill_value=0).reset_index())
Out[3771]:
mode customerID store_code a b c d
0 BBID_204100102 2655 1 0 0 0
1 BBID_204100102 2906 0 1 0 1
2 BBID_204100150 4986 1 0 1 0
3 BBID_204100277 4986 0 0 0 1
4 BBID_204100310 4986 0 0 0 1