熊猫:透过重复的价值观

时间:2017-05-03 16:01:17

标签: python pandas dataframe pivot

我想在一列中使用重复值来旋转数据框,以在新列中显示关联值,如下例所示。从Pandas文档中我无法弄清楚如何从这个... ...

name   car    model
rob    mazda  626
rob    bmw    328
james  audi   a4
james  VW     golf
tom    audi   a6
tom    ford   focus

对此...

name   car_1  model_1  car_2  model_2
rob    mazda  626      bmw    328
james  audi   a4       VW     golf
tom    audi   a6       ford   focus

2 个答案:

答案 0 :(得分:3)

x = df.groupby('name')['car','model'] \
      .apply(lambda x: pd.DataFrame(x.values.tolist(),
             columns=['car','model'])) \
      .unstack()
x.columns = ['{0[0]}_{0[1]}'.format(tup) for tup in x.columns]

结果:

In [152]: x
Out[152]:
       car_0 car_1 model_0 model_1
name
james   audi    VW      a4    golf
rob    mazda   bmw     626     328
tom     audi  ford      a6   focus

如何对列进行排序:

In [157]: x.loc[:, x.columns.str[::-1].sort_values().str[::-1]]
Out[157]:
      model_0  car_0 model_1 car_1
name
james      a4   audi    golf    VW
rob       626  mazda     328   bmw
tom        a6   audi   focus  ford

答案 1 :(得分:1)

我们可以使用groupbycumcount

设置索引
i = df.groupby('name').cumcount() + 1
df.set_index(['name', i2]).unstack()

         car       model       
           1     2     1      2
name                           
james   audi    VW    a4   golf
rob    mazda   bmw   626    328
tom     audi  ford    a6  focus

或者我们可以折叠pd.MultiIndex

i = df.groupby('name').cumcount() + 1
d1 = df.set_index(['name', i2]).unstack().sort_index(1, 1)
d1.columns = d1.columns.to_series().map('{0[0]}_{0[1]}'.format)
d1


       car_1 model_1 car_2 model_2
name                              
james   audi      a4    VW    golf
rob    mazda     626   bmw     328
tom     audi      a6  ford   focus