我有九列'instlevel1','instlevel2','instlevel3', 'instlevel4', 'instlevel5','instlevel6','instlevel7','instlevel8','instlevel9'
此列中的值填充如下:如果 instlevel1 值为1,则所有其他值均为0,如果 instlevel2 值为1,则其他所有值所有其他列(包括instlevel1)为0。
我想在某一列上“透视”。我得到了预期的结果。但我想知道是否有最有效的方法来做到这一点。因为这种情况非常重复。这是我所做的代码。
nivelEducacion = test[['instlevel1','instlevel2','instlevel3', 'instlevel4', 'instlevel5','instlevel6','instlevel7','instlevel8','instlevel9']].idxmax(axis=1)
test['nivelEducacion'] = nivelEducacion
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel1'], '1')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel2'], '2')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel3'], '3')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel4'], '4')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel5'], '5')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel6'], '6')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel7'], '7')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel8'], '8')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel9'], '9')
test['nivelEducacion'] = test.nivelEducacion.astype('category')
test = test.drop(['instlevel1', 'instlevel2','instlevel3','instlevel4','instlevel5','instlevel6','instlevel7','instlevel8','instlevel9'], axis=1)
答案 0 :(得分:0)
您可以在熊猫中使用melt的功能。这可能不是最好的解决方案,但可以做到:
s = pd.Series(list('aaabbbccddefgh')).astype('category') # generate fake dataset
df = pd.get_dummies(s) # fake df like you have (One Hot Encoded)
df2 = pd.melt(df, value_vars=["a", "b", "c", "d", "e", "f", "g", "h"])
df2 = df2[df2.value == 1] # to keep only existing categories
df2.drop("value", axis=1, inplace=True)
我发现的另一个解决方案是this one
x = df.stack() # in that case you have to restrict only to your columns
df2 = pd.Series(pd.Categorical(x[x!=0].index.get_level_values(1))).to_frame()
我希望这会有所帮助,
尼古拉斯