Question

我想将一个pandas数据帧从长到长重塑。挑战在于列具有多索引列标题。数据框如下所示：

category   price1           price2          
year       2011 2012 2013   2011 2012 2013
1            33   22   48    135  144  149
2            22   26   37    136  127  129
3            39   30   47    123  148  148
4            45   42   21    140  126  121
5            20   37   35    141  142  147
6            29   20   34    122  121  132
7            20   35   45    128  123  130
8            39   34   49    125  120  131
9            24   20   36    122  146  130
10           24   37   43    142  133  138
11           23   22   40    124  135  131
12           27   22   40    147  149  132

下面是一个产生相同数据帧的片段。您还将看到我通过连接其他两个数据帧来构建此数据帧。

以下是片段：

import pandas as pd
import numpy as np

# Make dataframe df1 with 12 observations over 3 years
# with multiindexed column headers
np.random.seed(123)
df1 = pd.DataFrame(np.random.randint(20, 50, size = (12,3)), columns=[2011,2012,2013])
df1.index = np.arange(1,len(df1)+1)
colNames1 = df1.columns
header1 = pd.MultiIndex.from_product([['price1'], colNames1], names=['category','year'])
df1.columns = header1

# Make dataframe df2 with 12 observations over 3 years
# with multiindexed column headers
df2 = pd.DataFrame(np.random.randint(120, 150, size = (12,3)), columns=[2011,2012,2013])
df2.index = np.arange(1,len(df2)+1)
colNames1 = df2.columns
header1 = pd.MultiIndex.from_product([['price2'], colNames1], names=['category','year'])
df2.columns = header1

df3 = pd.concat([df1, df2], axis = 1)

这是所需的输出：

        price1  price2
1   2011    33  135
2   2011    22  136
3   2011    39  123
4   2011    45  140
5   2011    20  141
6   2011    29  122
7   2011    20  128
8   2011    39  125
9   2011    24  122
10  2011    24  142
11  2011    23  124
12  2011    27  147
1   2012    22  144
2   2012    26  127
3   2012    30  148
4   2012    42  126
5   2012    37  142
6   2012    20  121
7   2012    35  123
8   2012    34  120
9   2012    20  146
10  2012    37  133
11  2012    22  135
12  2012    22  149
1   2013    48  149
2   2013    37  129
3   2013    47  148
4   2013    21  121
5   2013    35  147
6   2013    34  132
7   2013    45  130
8   2013    49  131
9   2013    36  130
10  2013    43  138
11  2013    40  131
12  2013    40  132

我根据Reshape和pandas.wide_to_long的建议尝试了不同的解决方案，但我正在努力使用多索引的列名称。那么为什么不删除它呢？主要是因为这是我的现实世界问题的样子，也是因为我拒绝相信它无法完成。

感谢您的任何建议！

Answer 1

使用stack为最后一级和sort_index，为列添加rename_axis和reset_index：

df3 = (df3.stack()
         .sort_index(level=[1,0])
        .rename_axis(['months','year'])
        .reset_index()
        .rename_axis(None, 1))
print (df3.head(15))
    months  year  price1  price2
0        1  2011      33     135
1        2  2011      22     136
2        3  2011      39     123
3        4  2011      45     140
4        5  2011      20     141
5        6  2011      29     122
6        7  2011      20     128
7        8  2011      39     125
8        9  2011      24     122
9       10  2011      24     142
10      11  2011      23     124
11      12  2011      27     147
12       1  2012      22     144
13       2  2012      26     127
14       3  2012      30     148

如果需要MutliIndex：

df3 = df3.stack().sort_index(level=[1,0])
print (df3.head(15))
category  price1  price2
   year                 
1  2011       33     135
2  2011       22     136
3  2011       39     123
4  2011       45     140
5  2011       20     141
6  2011       29     122
7  2011       20     128
8  2011       39     125
9  2011       24     122
10 2011       24     142
11 2011       23     124
12 2011       27     147
1  2012       22     144
2  2012       26     127
3  2012       30     148

使用从多到长的多索引列标题重新整形数据帧

1 个答案: