如何通过DataFrame将一组多列转换为一组标记值,同时不想转动所有列

时间:2017-10-26 14:29:57

标签: pandas dataframe pivot melt

所以,我有一个先进的枢轴问题。请考虑以下数据框

dfa = pandas.DataFrame([["g1","15","Annie","Bard"], ["g2","18","Lux","Annie"], ["g3","15","Olaf","Twitch"]], columns=["gameId", "duration", "Champ1", "Champ2"])

这给出了输出:

enter image description here

通过在以下堆栈溢出问题how to pivot complex dataframe中应用逻辑,我得到

pandas.melt(dfa, id_vars=['gameId']) \
    .set_index('gameId')['value'] \
    .str.get_dummies() \
    .groupby(level=0) \
    .agg(np.sum)

enter image description here

但是,我不想转动持续时间列,因此我更改了代码并添加了' value_vars'

pandas.melt(dfa, id_vars=['gameId'], value_vars = ['Champ1','Champ2']) \
    .set_index('gameId')['value'] \
    .str.get_dummies() \
    .groupby(level=0) \
    .agg(np.sum)

enter image description here

现在我丢失了持续时间列。此列不是一个索引,因为它不是唯一的,但我不想转动它。我已经尝试过将其添加到' id_vars'或者' set_index()',但它似乎不起作用。

思考?

提前致谢!

1 个答案:

答案 0 :(得分:1)

您可以将duration列添加到id_vars,然后set_index中的groupbyMultiindex的{​​{1}}添加a = pd.melt(dfa, id_vars=['gameId', 'duration']) \ .set_index(['gameId', 'duration'])['value'] \ .str.get_dummies() .sum(level=[0,1]) print (a) Annie Bard Lux Olaf Twitch gameId duration g1 15 1 1 0 0 0 g2 18 1 0 1 0 0 g3 15 0 0 0 1 1 ,如下所示:

a = pd.melt(dfa, id_vars=['gameId', 'duration']) \
    .set_index(['gameId', 'duration'])['value'] \
    .str.get_dummies() \
    .groupby(level=[0,1]) \
    .sum()
print (a)
                 Annie  Bard  Lux  Olaf  Twitch
gameId duration                                
g1     15            1     1    0     0       0
g2     18            1     0    1     0       0
g3     15            0     0    0     1       1

与...相同:

Help