如何通过在python中将行转换为重复列来重塑数据框?

时间:2017-05-04 15:50:16

标签: python pandas numpy dataframe

此数据框DF

 Stock      Date      Time     Price     Open      
   AAA   2002-02-23  10:13     2.440     0.01    
   AAA   2002-02-27  17:17     2.460     0.02    

成为:Transformed

   Stock   Date      Time_0    Price_0   Open_0  Time_1  Price_1  Open_1     
   AAA   2002-02-23  10:13     2.440     0.01    17:17    2.460    0.02
   AAA   2002-02-27  17:17     2.460     0.02    NA       NA       NA

我想对更大的数据集应用上述操作是否有一种有效的方法来做到这一点? (图像有更详细的表示)

编辑:解决方案 How to create a lagged data structure using pandas dataframe  这回答了问题

1 个答案:

答案 0 :(得分:0)

数据设置:

df = pd.DataFrame({'Stock': {0: 'AAA', 1: 'AAA', 2: 'AAA'},
 'Date': {0: '2002-02-23', 1: '2002-02-27', 2: '2002-02-27'},
 'Time': {0: '10:13', 1: '17:17', 2: '17:17'},
 'Price': {0: 2.44, 1: 2.46, 2: 3.2},
 'Open': {0: 0.01, 1: 0.02, 2: 0.02} 
 })
#Reorder columns
df = df[['Stock','Date','Time','Price','Open']]
df
Out[1221]: 
  Stock        Date   Time  Price  Open
0   AAA  2002-02-23  10:13   2.44  0.01
1   AAA  2002-02-27  17:17   2.46  0.02
2   AAA  2002-02-27  17:17   3.20  0.02

<强>解决方案:

#get the 'Time', 'Price','Open' fileds from the next row and create a new dataframe
df_1 = df.apply(lambda x: df.ix[x.name+1][['Time', 'Price','Open']] if (x.name+1) < len(df) else np.nan , axis=1)

#join the original df and the new df
df.join(df_1,lsuffix='_0',rsuffix='_1')
Out[1223]: 
  Stock        Date Time_0  Price_0  Open_0 Time_1  Price_1  Open_1
0   AAA  2002-02-23  10:13     2.44    0.01  17:17     2.46    0.02
1   AAA  2002-02-27  17:17     2.46    0.02  17:17     3.20    0.02
2   AAA  2002-02-27  17:17     3.20    0.02    NaN      NaN     NaN

使用OP的原始数据,输出将是:

Out[1270]: 
  Stock        Date Time_0  Price_0  Open_0 Time_1  Price_1  Open_1
0   AAA  2002-02-23  10:13     2.44    0.01  17:17     2.46    0.02
1   AAA  2002-02-27  17:17     2.46    0.02    NaN      NaN     NaN