在python中将列重复为行?

时间:2018-07-21 02:13:07

标签: python pandas dataframe

   Fruit      January Shipments   January Sales   February Shipments   February Sales  
 ------------ ------------------- --------------- -------------------- ---------------- 
  Apple                      30              11                   18               31   
  Banana                     12              49                   39               14   
  Pear                       25              50                   44               21   
  Kiwi                       41              25                   10               25   
  Strawberry                 11              33                   35               50   

我正在尝试获得以下结果:

 Fruit       Month     Shipments   Sales  
 ------------ ---------- ----------- ------- 
  Apple        January          30      11   
  Banana       January          12      49   
  Pear         January          25      50   
  Kiwi         January          41      25   
  Strawberry   January          11      33   
  Apple        February         18      31   
  Banana       February         39      14   
  Pear         February         44      21   
  Kiwi         February         10      25   
  Strawberry   February         35      50   

我尝试过pandas.pivot和pandas.pivot_table,但没有运气。我正在创建两个数据框(水果/月/船)和(水果/月/销售),并将两者通过循环串联在一起,但我希望能有一种更简单的方法。 >

2 个答案:

答案 0 :(得分:2)

一种方法是使用将列修改为多级,然后使用stack。假设您的数据帧称为df。首先将“水果”列设置为索引,然后定义多层列:

df = df.set_index('Fruit')
# manual way to create the multiindex columns
#df.columns = pd.MultiIndex.from_product([['January','February'],
#                                         ['Shipments','Sales']], names=['Month',None])
# more general way to create the multiindex columns thanks to @Scott Boston
df.columns = df.columns.str.split(expand=True)
df.columns.names = ['Month',None]

您的数据如下:

Month        January        February      
           Shipments Sales Shipments Sales
Fruit                                     
Apple             30    11        18    31
Banana            12    49        39    14
Pear              25    50        44    21
Kiwi              41    25        10    25
Strawberry        11    33        35    50

现在您可以在级别0和stack上使用reset_index

df_output = df.stack(0).reset_index()

给出

        Fruit     Month  Sales  Shipments
0       Apple  February     31         18
1       Apple   January     11         30
2      Banana  February     14         39
3      Banana   January     49         12
4        Pear  February     21         44
5        Pear   January     50         25
6        Kiwi  February     25         10
7        Kiwi   January     25         41
8  Strawberry  February     50         35
9  Strawberry   January     33         11

最后,如果要在“月份”列中指定特定的值顺序,可以使用pd.Categorical

df_output['Month'] = pd.Categorical(df_output['Month'].tolist(), ordered=True,
                                    categories=['January','February'])

在排序时将一月设置为二月之前。现在,做

df_output = df_output.sort_values(['Month'])

给出结果:

        Fruit     Month  Sales  Shipments
1       Apple   January     11         30
3      Banana   January     49         12
5        Pear   January     50         25
7        Kiwi   January     25         41
9  Strawberry   January     33         11
0       Apple  February     31         18
2      Banana  February     14         39
4        Pear  February     21         44
6        Kiwi  February     25         10
8  Strawberry  February     50         35

我看到这与预期的输出(水果列的顺序和列的顺序)不完全相同,但是如果需要,可以很容易地更改两者。

答案 1 :(得分:1)

如何按照@ user3483203的建议使用pd.wide_to_long

df1 = df.set_index('Fruit')

#First we have to so column renaming use multiindex column headers and swapping levels.
df1.columns = df1.columns.str.split(expand=True)
df1.columns = df1.columns.map('{0[1]}_{0[0]}'.format)

#Reset index and use pd.wide_to_long:
df1 = df1.reset_index()
df_out = pd.wide_to_long(df1, ['Shipments','Sales'], 'Fruit', 'Month','_','\w+')\
           .reset_index()

print(df_out)

输出:

        Fruit     Month  Shipments  Sales
0       Apple   January       30.0   11.0
1      Banana   January       12.0   49.0
2        Pear   January       25.0   50.0
3        Kiwi   January       41.0   25.0
4  Strawberry   January       11.0   33.0
5       Apple  February       18.0   31.0
6      Banana  February       39.0   14.0
7        Pear  February       44.0   21.0
8        Kiwi  February       10.0   25.0
9  Strawberry  February       35.0   50.0