当类别是多字符串中的单个字符时创建虚拟对象

时间:2018-03-11 18:25:52

标签: python pandas

考虑我在Pandas系列中的数据

s = pd.Series('1az wb58 jsui ne3'.split())

s

0     1az
1    wb58
2    jsui
3     ne3
dtype: object

我需要它看起来像:

   1  3  5  8  a  b  e  i  j  n  s  u  w  z
0  1  0  0  0  1  0  0  0  0  0  0  0  0  1
1  0  0  1  1  0  1  0  0  0  0  0  0  1  0
2  0  0  0  0  0  0  0  1  1  0  1  1  0  0
3  0  1  0  0  0  0  1  0  0  1  0  0  0  0

然而,当我尝试:

pd.get_dummies(s)

   1az  jsui  ne3  wb58
0    1     0    0     0
1    0     0    0     1
2    0     1    0     0
3    0     0    1     0

最简洁的方法是什么?

2 个答案:

答案 0 :(得分:2)

使用MultiLabelBinarizerDataFrame构造函数的解决方案:

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_)
print (df)
   1  3  5  8  a  b  e  i  j  n  s  u  w  z
0  1  0  0  0  1  0  0  0  0  0  0  0  0  1
1  0  0  1  1  0  1  0  0  0  0  0  0  1  0
2  0  0  0  0  0  0  0  1  1  0  1  1  0  0
3  0  1  0  0  0  0  1  0  0  1  0  0  0  0

另一种解决方案 - DataFrame.from_records + get_dummies,但最后是max所需的汇总列:

df = pd.get_dummies(pd.DataFrame.from_records(s),prefix_sep='',prefix='').max(level=0, axis=1)
print (df)
   1  3  5  8  a  b  e  i  j  n  s  u  w  z
0  1  0  0  0  1  0  0  0  0  0  0  0  0  1
1  0  0  1  1  0  1  0  0  0  0  0  0  1  0
2  0  0  0  0  0  0  0  1  1  0  1  1  0  0
3  0  1  0  0  0  0  1  0  0  1  0  0  0  0

答案 1 :(得分:2)

也许应用list

pd.get_dummies(s.apply(list).apply(pd.Series).stack()).sum(level=0)
Out[222]: 
   1  3  5  8  a  b  e  i  j  n  s  u  w  z
0  1  0  0  0  1  0  0  0  0  0  0  0  0  1
1  0  0  1  1  0  1  0  0  0  0  0  0  1  0
2  0  0  0  0  0  0  0  1  1  0  1  1  0  0
3  0  1  0  0  0  0  1  0  0  1  0  0  0  0

s.apply(list).str.join(',').str.get_dummies(',')
Out[224]: 
   1  3  5  8  a  b  e  i  j  n  s  u  w  z
0  1  0  0  0  1  0  0  0  0  0  0  0  0  1
1  0  0  1  1  0  1  0  0  0  0  0  0  1  0
2  0  0  0  0  0  0  0  1  1  0  1  1  0  0
3  0  1  0  0  0  0  1  0  0  1  0  0  0  0