我有一些看起来如下的数据:
Owner Label1 Label2 Label3
Bob Dog N/A N/A
John Cat Mouse N/A
Lee Dog Cat N/A
Jane Hamster Rat Ferret
我想将其重塑为一键编码。像这样:
Owner Dog Cat Mouse Hamster Rat Ferret
Bob 1 0 0 0 0 0
John 0 1 1 0 0 0
Lee 1 1 0 0 0 0
Jane 0 0 0 1 1 1
我查看了文档和stackoverflow,但是无法确定实现此目的所需的相关功能。 get_dummies非常接近,但是只有当该类别出现在相应的列中时,它才会为每个类别创建一个前缀。
答案 0 :(得分:4)
使用
df.set_index('Owner').stack().str.get_dummies().sum(level=0)
Out[535]:
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
或
s=df.melt('Owner')
pd.crosstab(s.Owner,s.value)
Out[540]:
value Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
Jane 0 0 1 1 0 1
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
答案 1 :(得分:3)
您可以在堆叠的数据集上使用get_dummies
,然后使用groupby和求和:
pd.get_dummies(df.set_index('Owner').stack()).groupby('Owner').sum()
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
答案 2 :(得分:2)
sklearn.preprocessing.MultiLabelBinarizer
from sklearn.preprocessing import MultiLabelBinarizer
o, l = zip(*[[o, [*filter(pd.notna, l)]] for o, *l in zip(*map(df.get, df))])
mlb = MultiLabelBinarizer()
d = mlb.fit_transform(l)
pd.DataFrame(d, o, mlb.classes_)
Cat Dog Ferret Hamster Mouse Rat
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
o = df.Owner
l = [[x for x in l if pd.notna(x)] for l in df.filter(like='Label').values]
mlb = MultiLabelBinarizer()
d = mlb.fit_transform(l)
pd.DataFrame(d, o, mlb.classes_)
Cat Dog Ferret Hamster Mouse Rat
Owner
Bob 0 1 0 0 0 0
John 1 0 0 0 1 0
Lee 1 1 0 0 0 0
Jane 0 0 1 1 0 1
答案 3 :(得分:0)
pandas.get_dummies
函数可一步将分类变量转换为虚拟变量/指标变量