考虑我在Pandas系列中的数据
s = pd.Series('1az wb58 jsui ne3'.split())
s
0 1az
1 wb58
2 jsui
3 ne3
dtype: object
我需要它看起来像:
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0
然而,当我尝试:
pd.get_dummies(s)
1az jsui ne3 wb58
0 1 0 0 0
1 0 0 0 1
2 0 1 0 0
3 0 0 1 0
最简洁的方法是什么?
答案 0 :(得分:2)
使用MultiLabelBinarizer和DataFrame
构造函数的解决方案:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_)
print (df)
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0
另一种解决方案 - DataFrame.from_records
+ get_dummies
,但最后是max
所需的汇总列:
df = pd.get_dummies(pd.DataFrame.from_records(s),prefix_sep='',prefix='').max(level=0, axis=1)
print (df)
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0
答案 1 :(得分:2)
也许应用list
pd.get_dummies(s.apply(list).apply(pd.Series).stack()).sum(level=0)
Out[222]:
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0
或
s.apply(list).str.join(',').str.get_dummies(',')
Out[224]:
1 3 5 8 a b e i j n s u w z
0 1 0 0 0 1 0 0 0 0 0 0 0 0 1
1 0 0 1 1 0 1 0 0 0 0 0 0 1 0
2 0 0 0 0 0 0 0 1 1 0 1 1 0 0
3 0 1 0 0 0 0 1 0 0 1 0 0 0 0