如果我有这些数据
data = {'f1':['A','B','B','A'],'f2':['X','Y','Z','Z']}
df = pd.DataFrame(data)
feat_1 = pd.get_dummies(df['f1'])
feat_2 = pd.get_dummies(df['f2'])
在熊猫中feat_1和feat_2之间进行这种乘法的短方法是什么?
feat_1
A B
0 1 0
1 0 1
2 0 1
3 1 0
feat_2
X Y Z
0 1 0 0
1 0 1 0
2 0 0 1
3 0 0 1
所需结果:feat_1 * feat_2
AX AY AZ BX BY BZ
0 1 0 0 0 0 0
1 0 0 0 0 1 0
2 0 0 0 0 0 1
3 0 0 1 0 0 0
答案 0 :(得分:2)
在reindex
之后使用get_dummies
col=pd.MultiIndex.from_product([df.f1.unique(),df.f2.unique()]).map(''.join)
df.apply(''.join,1).str.get_dummies().reindex(columns=col,fill_value=0)
Out[605]:
AX AY AZ BX BY BZ
0 1 0 0 0 0 0
1 0 0 0 0 1 0
2 0 0 0 0 0 1
3 0 0 1 0 0 0
答案 1 :(得分:0)
使用类别:
from itertools import product
cats = map(''.join, product(df['f1'].unique(), df['f2'].unique()))
cdt = pd.api.types.CategoricalDtype(cats)
result = pd.get_dummies(df.sum(axis=1).astype(cdt))
哪个会产生所需的结果:
AX AY AZ BX BY BZ
0 1 0 0 0 0 0
1 0 0 0 0 1 0
2 0 0 0 0 0 1
3 0 0 1 0 0 0
答案 2 :(得分:0)
您还可以使用pd.concat
和list comprehension
pd.concat([feat_1[col].mul(feat_2[col2]).rename(col +col2) for col in feat_1.columns for col2 in feat_2.columns], axis=1)
AX AY AZ BX BY BZ
0 1 0 0 0 0 0
1 0 0 0 0 1 0
2 0 0 0 0 0 1
3 0 0 1 0 0 0
%timeit pd.concat([feat_1[col].mul(feat_2[col2]).rename(col +col2) for col in feat_1.columns for col2 in feat_2.columns], axis=1)
1.54 ms ± 24.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit col=pd.MultiIndex.from_product([df.f1.unique(),df.f2.unique()]).map(''.join); df.apply(''.join,1).str.get_dummies().reindex(columns=col,fill_value=0)
3.57 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)