转换DataFrame(pivot)

时间:2015-10-20 13:19:05

标签: python pandas dataframe

我确实在将DataFrame转换为新结构时遇到了问题。 将数据透视表转换为DataFrame后,我的数据如下所示:

          model      model1     model2
 time     color     
2001-01   blue       200,000    120,000
          red        100,000    100,000
          yellow     250,000    80,000
          white      100,000    100,000
2002-01   blue       140,000    150,000
          red        200,000    100,000
          yellow     400,000    200,000
          white      200,000    100,000
...

现在,我想把它变成: 时间作为索引,每个模型的颜色都是一个不同的列。

          model1_blue  model1_red  model1_yellow  model1_white  model2_blue ...
 time     
2001-01   200,000      100,000     250,000        100,000       120,000
2002-01   140,000      200,000     400,000        200,000       150,000
...

现在:这是如何工作的:)?谢谢!

2 个答案:

答案 0 :(得分:1)

假设timemodel color正在形成层次结构索引(如果不是,您可以使用pd.MultiIndex.from_arrays轻松创建此索引),最简单的解决方案是“取消堆叠” index:

import pandas as pd
df = pd.DataFrame([
        [200, 120], 
        [201, 123], 
        [202, 124], 
        [203, 125] ,
        [204, 126] ,
        [205, 126] ,
        [205, 127], 
        [205, 127],         
    ], columns=["model1", "model2"])

df.index = pd.MultiIndex.from_product([["2001-01", "2001-02"], ["blue", "red", "yellow", "white"]])
df

enter image description here

df.unstack()

enter image description here

答案 1 :(得分:1)

假设model是一个索引,如果不是,你只需使用

将其转换为索引即可
df.set_index('model' , inplace=True)

将您的模型索引转换为列

df_unstacked = df.unstack('model')


Out[28]:
        model1                     model2
model   blue    red     white   yellow  blue    red     white   yellow
time                                
2001-01 200,000 100,000 100,000 250,000 120,000 100,000 100,000 80,000
2002-01 140,000 200,000 200,000 400,000 150,000 100,000 100,000 200,000

检索两个级别的列名称

first_level_names = df_unstacked.columns.levels[0]
second_level_names = df_unstacked.columns.levels[1]

创建新列名称

new_columns = [ first+ '_' + second for first in first_level_names  for second in second_level_names ]

为数据框分配新的列名

df_unstacked.columns = new_columns

Out[33]:
        model1_blue model1_red  model1_white    model1_yellow   model2_blue model2_red  model2_white    model2_yellow
time                                
2001-01 200,000     100,000      100,000          250,000        120,000      100,000   100,000         80,000
2002-01 140,000     200,000      200,000          400,000        150,000      100,000   100,000         200,000