我有一个数据框:
name = ['fred','fred','fred','james','james','rick','rick','jeff']
actionfigures = ['superman','batman','flash','greenlantern','flash','batman','joker','superman']
cars = ['lamborghini', 'ferrari','bugatti','ferrari','corvette','bugatti','bmw','bmw']
pets = ['cat','dog','bird','cat','dog','dog','fish','marmet']
test = pd.DataFrame({'name':name,'actfig':actionfigures,'car':cars,'pet':pets})
actfig car name pet
0 superman lamborghini fred cat
1 batman ferrari fred dog
2 flash bugatti fred bird
3 greenlantern ferrari james cat
4 flash corvette james dog
5 batman bugatti rick dog
6 joker bmw rick fish
7 superman bmw jeff marmet
如果我的术语不正确,请原谅我,但我想转动数据,以便在[' actionfigures'' car',''''' ;每个名称的pet']列。
batman flash greenlantern joker superman bmw bugatti corvette ferrari lamborghini bird cat dog fish marmet
name
fred 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0
james 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
jeff 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
rick 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0
我原以为test.pivot_table(index='name',columns=['actfig','car','pet'],aggfunc='size'])
会这样做,但它给了我一些奇怪的多级列。
想想也许我可以为每一列连续get_dummies
然后按名称和总和进行分组,但感觉pandas prob有更好的方法。
如何做到这一点?
答案 0 :(得分:3)
melt
和pivot
test.melt('name').assign(new=1).pivot('name','value','new').fillna(0)
Out[239]:
value batman bird bmw bugatti cat corvette dog ferrari fish flash \
name
fred 1.0 1.0 0.0 1.0 1.0 0.0 1.0 1.0 0.0 1.0
james 0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 0.0 1.0
jeff 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
rick 1.0 0.0 1.0 1.0 0.0 0.0 1.0 0.0 1.0 0.0
value greenlantern joker lamborghini marmet superman
name
fred 0.0 0.0 1.0 0.0 1.0
james 1.0 0.0 0.0 0.0 0.0
jeff 0.0 0.0 0.0 1.0 1.0
rick 0.0 1.0 0.0 0.0 0.0
或get_dummies
pd.get_dummies(test.set_index('name')).sum(level=0)
Out[248]:
actfig_batman actfig_flash actfig_greenlantern actfig_joker \
name
fred 1 1 0 0
james 0 1 1 0
jeff 0 0 0 0
rick 1 0 0 1
actfig_superman car_bmw car_bugatti car_corvette car_ferrari \
name
fred 1 0 1 0 1
james 0 0 0 1 1
jeff 1 1 0 0 0
rick 0 1 1 0 0
car_lamborghini pet_bird pet_cat pet_dog pet_fish pet_marmet
name
fred 1 1 1 1 0 0
james 0 0 1 1 0 0
jeff 0 0 0 0 0 1
rick 0 0 0 1 1 0
编辑:根据PiR
pd.get_dummies(test.set_index('name'), prefix_sep='|').sum(level=0).rename(columns=lambda c: c.rsplit('|', 1)[1])
答案 1 :(得分:3)
选项1
部分pd.get_dummies
a = pd.get_dummies(test.actfig)
c = pd.get_dummies(test.car)
p = pd.get_dummies(test.pet)
n = pd.get_dummies(test.name).T
pd.concat([n.dot(d) for d in [a, c, p]], axis=1)
batman flash greenlantern joker superman bmw bugatti corvette ferrari lamborghini bird cat dog fish marmet
fred 1 1 0 0 1 0 1 0 1 1 1 1 1 0 0
james 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0
jeff 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1
rick 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0
选项2
stack
+ pd.crosstab
test.set_index('name').stack().pipe(
lambda x: pd.crosstab(x.index.get_level_values(0), x.values))
col_0 batman bird bmw bugatti cat corvette dog ferrari fish flash greenlantern joker lamborghini marmet superman
row_0
fred 1 1 0 1 1 0 1 1 0 1 0 0 1 0 1
james 0 0 0 0 1 1 1 1 0 1 1 0 0 0 0
jeff 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1
rick 1 0 1 1 0 0 1 0 1 0 0 1 0 0 0