我想将下面的DataFrame df转换为另一个:
import pandas as pd
data = {
'dates':['01/01/2018','02/01/2018','03/01/2018','04/01/2018','05/01/2018'],
'A X':[1,1,2,1,1],
'A Y':[1,1,3,1,1],
'A Z':[1,1,4,1,1],
'B X':[2,2,3,2,2],
'B Y':[2,2,4,2,2],
'C X':[3,3,4,3,3]
}
df = pd.DataFrame(data, columns=['dates','A X','A Y','A Z','B X','B Y','C X'])
Desired DataFrame:
dates fields A B C
01/01/2018 X 1 2 3
02/01/2018 X 1 2 3
03/01/2018 X 2 3 4
04/01/2018 X 1 2 3
05/01/2018 X 1 2 3
01/01/2018 Y 1 2 nan
02/01/2018 Y 1 2 nan
03/01/2018 Y 3 4 nan
04/01/2018 Y 1 2 nan
05/01/2018 Y 1 2 nan
01/01/2018 Z 1 nan nan
02/01/2018 Z 1 nan nan
03/01/2018 Z 4 nan nan
04/01/2018 Z 1 nan nan
05/01/2018 Z 1 nan nan
日期被设置为新的索引值,插入了一个名为“fields”的新列,其中包含从df列标题中提取的字符串“X”,“Y”,“Z”。我怎么能做到这一点? (pandas v0.22)
答案 0 :(得分:2)
使用:
set_index
仅适用于带空格的列split
MultiIndex
列
stack
reset_index
来自index
rename
专栏sort_values
fields
列排序DataFrame
df = df.set_index('dates')
df.columns = df.columns.str.split(expand=True)
df = df.stack().reset_index().rename(columns={'level_1':'fields'}).sort_values('fields')
print (df)
dates fields A B C
0 01/01/2018 X 1 2.0 3.0
3 02/01/2018 X 1 2.0 3.0
6 03/01/2018 X 2 3.0 4.0
9 04/01/2018 X 1 2.0 3.0
12 05/01/2018 X 1 2.0 3.0
1 01/01/2018 Y 1 2.0 NaN
4 02/01/2018 Y 1 2.0 NaN
7 03/01/2018 Y 3 4.0 NaN
10 04/01/2018 Y 1 2.0 NaN
13 05/01/2018 Y 1 2.0 NaN
2 01/01/2018 Z 1 NaN NaN
5 02/01/2018 Z 1 NaN NaN
8 03/01/2018 Z 4 NaN NaN
11 04/01/2018 Z 1 NaN NaN
14 05/01/2018 Z 1 NaN NaN
谢谢@Paul H的即兴回答:
df = (df.set_index('dates')
.rename(columns=lambda c: tuple(c.split()))
.stack()
.rename_axis(('dates','fields'))
.sort_index(level='fields')
.reset_index()
)
print (df)
dates fields A B C
0 01/01/2018 X 1 2.0 3.0
1 02/01/2018 X 1 2.0 3.0
2 03/01/2018 X 2 3.0 4.0
3 04/01/2018 X 1 2.0 3.0
4 05/01/2018 X 1 2.0 3.0
5 01/01/2018 Y 1 2.0 NaN
6 02/01/2018 Y 1 2.0 NaN
7 03/01/2018 Y 3 4.0 NaN
8 04/01/2018 Y 1 2.0 NaN
9 05/01/2018 Y 1 2.0 NaN
10 01/01/2018 Z 1 NaN NaN
11 02/01/2018 Z 1 NaN NaN
12 03/01/2018 Z 4 NaN NaN
13 04/01/2018 Z 1 NaN NaN
14 05/01/2018 Z 1 NaN NaN