Fruit January Shipments January Sales February Shipments February Sales
------------ ------------------- --------------- -------------------- ----------------
Apple 30 11 18 31
Banana 12 49 39 14
Pear 25 50 44 21
Kiwi 41 25 10 25
Strawberry 11 33 35 50
我正在尝试获得以下结果:
Fruit Month Shipments Sales
------------ ---------- ----------- -------
Apple January 30 11
Banana January 12 49
Pear January 25 50
Kiwi January 41 25
Strawberry January 11 33
Apple February 18 31
Banana February 39 14
Pear February 44 21
Kiwi February 10 25
Strawberry February 35 50
我尝试过pandas.pivot和pandas.pivot_table,但没有运气。我正在创建两个数据框(水果/月/船)和(水果/月/销售),并将两者通过循环串联在一起,但我希望能有一种更简单的方法。 >
答案 0 :(得分:2)
一种方法是使用将列修改为多级,然后使用stack
。假设您的数据帧称为df。首先将“水果”列设置为索引,然后定义多层列:
df = df.set_index('Fruit')
# manual way to create the multiindex columns
#df.columns = pd.MultiIndex.from_product([['January','February'],
# ['Shipments','Sales']], names=['Month',None])
# more general way to create the multiindex columns thanks to @Scott Boston
df.columns = df.columns.str.split(expand=True)
df.columns.names = ['Month',None]
您的数据如下:
Month January February
Shipments Sales Shipments Sales
Fruit
Apple 30 11 18 31
Banana 12 49 39 14
Pear 25 50 44 21
Kiwi 41 25 10 25
Strawberry 11 33 35 50
现在您可以在级别0和stack
上使用reset_index
df_output = df.stack(0).reset_index()
给出
Fruit Month Sales Shipments
0 Apple February 31 18
1 Apple January 11 30
2 Banana February 14 39
3 Banana January 49 12
4 Pear February 21 44
5 Pear January 50 25
6 Kiwi February 25 10
7 Kiwi January 25 41
8 Strawberry February 50 35
9 Strawberry January 33 11
最后,如果要在“月份”列中指定特定的值顺序,可以使用pd.Categorical
:
df_output['Month'] = pd.Categorical(df_output['Month'].tolist(), ordered=True,
categories=['January','February'])
在排序时将一月设置为二月之前。现在,做
df_output = df_output.sort_values(['Month'])
给出结果:
Fruit Month Sales Shipments
1 Apple January 11 30
3 Banana January 49 12
5 Pear January 50 25
7 Kiwi January 25 41
9 Strawberry January 33 11
0 Apple February 31 18
2 Banana February 14 39
4 Pear February 21 44
6 Kiwi February 25 10
8 Strawberry February 50 35
我看到这与预期的输出(水果列的顺序和列的顺序)不完全相同,但是如果需要,可以很容易地更改两者。
答案 1 :(得分:1)
如何按照@ user3483203的建议使用pd.wide_to_long
。
df1 = df.set_index('Fruit')
#First we have to so column renaming use multiindex column headers and swapping levels.
df1.columns = df1.columns.str.split(expand=True)
df1.columns = df1.columns.map('{0[1]}_{0[0]}'.format)
#Reset index and use pd.wide_to_long:
df1 = df1.reset_index()
df_out = pd.wide_to_long(df1, ['Shipments','Sales'], 'Fruit', 'Month','_','\w+')\
.reset_index()
print(df_out)
输出:
Fruit Month Shipments Sales
0 Apple January 30.0 11.0
1 Banana January 12.0 49.0
2 Pear January 25.0 50.0
3 Kiwi January 41.0 25.0
4 Strawberry January 11.0 33.0
5 Apple February 18.0 31.0
6 Banana February 39.0 14.0
7 Pear February 44.0 21.0
8 Kiwi February 10.0 25.0
9 Strawberry February 35.0 50.0