合并不同的列值-Pandas

时间:2018-07-26 15:33:05

标签: python pandas sklearn-pandas

我有九列'instlevel1','instlevel2','instlevel3', 'instlevel4', 'instlevel5','instlevel6','instlevel7','instlevel8','instlevel9'

此列中的值填充如下:如果 instlevel1 值为1,则所有其他值均为0,如果 instlevel2 值为1,则其他所有值所有其他列(包括instlevel1)为0。

我想在某一列上“透视”。我得到了预期的结果。但我想知道是否有最有效的方法来做到这一点。因为这种情况非常重复。这是我所做的代码。

nivelEducacion = test[['instlevel1','instlevel2','instlevel3', 'instlevel4', 'instlevel5','instlevel6','instlevel7','instlevel8','instlevel9']].idxmax(axis=1)

test['nivelEducacion'] = nivelEducacion
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel1'], '1')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel2'], '2')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel3'], '3')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel4'], '4')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel5'], '5')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel6'], '6')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel7'], '7')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel8'], '8')
test['nivelEducacion'] = test['nivelEducacion'].replace(['instlevel9'], '9')
test['nivelEducacion'] = test.nivelEducacion.astype('category')
test = test.drop(['instlevel1', 'instlevel2','instlevel3','instlevel4','instlevel5','instlevel6','instlevel7','instlevel8','instlevel9'], axis=1)

1 个答案:

答案 0 :(得分:0)

您可以在熊猫中使用melt的功能。这可能不是最好的解决方案,但可以做到:

s = pd.Series(list('aaabbbccddefgh')).astype('category') # generate fake dataset
df = pd.get_dummies(s) # fake df like you have (One Hot Encoded)

df2 = pd.melt(df, value_vars=["a", "b", "c", "d", "e", "f", "g", "h"])
df2 = df2[df2.value == 1]  # to keep only existing categories
df2.drop("value", axis=1, inplace=True)

我发现的另一个解决方案是this one

x = df.stack()  # in that case you have to restrict only to your columns
df2 = pd.Series(pd.Categorical(x[x!=0].index.get_level_values(1))).to_frame()

我希望这会有所帮助,

尼古拉斯