使用新的日期索引创建Pandas DataFrame,使用标题的子字符串创建新列?

时间:2018-01-30 08:57:15

标签: python pandas dataframe

我想将下面的DataFrame df转换为另一个:

import pandas as pd
data = {
    'dates':['01/01/2018','02/01/2018','03/01/2018','04/01/2018','05/01/2018'],
    'A X':[1,1,2,1,1],
    'A Y':[1,1,3,1,1],
    'A Z':[1,1,4,1,1],
    'B X':[2,2,3,2,2],
    'B Y':[2,2,4,2,2],
    'C X':[3,3,4,3,3]
       }
df = pd.DataFrame(data, columns=['dates','A X','A Y','A Z','B X','B Y','C X'])

Desired DataFrame:

dates   fields  A   B   C
01/01/2018  X   1   2   3
02/01/2018  X   1   2   3
03/01/2018  X   2   3   4
04/01/2018  X   1   2   3
05/01/2018  X   1   2   3
01/01/2018  Y   1   2   nan
02/01/2018  Y   1   2   nan
03/01/2018  Y   3   4   nan
04/01/2018  Y   1   2   nan
05/01/2018  Y   1   2   nan
01/01/2018  Z   1   nan nan
02/01/2018  Z   1   nan nan
03/01/2018  Z   4   nan nan
04/01/2018  Z   1   nan nan
05/01/2018  Z   1   nan nan

日期被设置为新的索引值,插入了一个名为“fields”的新列,其中包含从df列标题中提取的字符串“X”,“Y”,“Z”。我怎么能做到这一点? (pandas v0.22)

1 个答案:

答案 0 :(得分:2)

使用:

df = df.set_index('dates')
df.columns = df.columns.str.split(expand=True)
df = df.stack().reset_index().rename(columns={'level_1':'fields'}).sort_values('fields')
print (df)

         dates fields  A    B    C
0   01/01/2018      X  1  2.0  3.0
3   02/01/2018      X  1  2.0  3.0
6   03/01/2018      X  2  3.0  4.0
9   04/01/2018      X  1  2.0  3.0
12  05/01/2018      X  1  2.0  3.0
1   01/01/2018      Y  1  2.0  NaN
4   02/01/2018      Y  1  2.0  NaN
7   03/01/2018      Y  3  4.0  NaN
10  04/01/2018      Y  1  2.0  NaN
13  05/01/2018      Y  1  2.0  NaN
2   01/01/2018      Z  1  NaN  NaN
5   02/01/2018      Z  1  NaN  NaN
8   03/01/2018      Z  4  NaN  NaN
11  04/01/2018      Z  1  NaN  NaN
14  05/01/2018      Z  1  NaN  NaN

谢谢@Paul H的即兴回答:

df = (df.set_index('dates')
       .rename(columns=lambda c: tuple(c.split()))
       .stack()
       .rename_axis(('dates','fields'))
       .sort_index(level='fields')
       .reset_index()
       )
print (df)


         dates fields  A    B    C
0   01/01/2018      X  1  2.0  3.0
1   02/01/2018      X  1  2.0  3.0
2   03/01/2018      X  2  3.0  4.0
3   04/01/2018      X  1  2.0  3.0
4   05/01/2018      X  1  2.0  3.0
5   01/01/2018      Y  1  2.0  NaN
6   02/01/2018      Y  1  2.0  NaN
7   03/01/2018      Y  3  4.0  NaN
8   04/01/2018      Y  1  2.0  NaN
9   05/01/2018      Y  1  2.0  NaN
10  01/01/2018      Z  1  NaN  NaN
11  02/01/2018      Z  1  NaN  NaN
12  03/01/2018      Z  4  NaN  NaN
13  04/01/2018      Z  1  NaN  NaN
14  05/01/2018      Z  1  NaN  NaN