如何将列添加到多索引数据框?

时间:2016-12-15 09:05:59

标签: python python-2.7 python-3.x pandas dataframe

我有一个多索引数据框,如下所示:

       ACA FP Equity            UCG IM Equity            
          LAST PRICE     VOLUME    LAST PRICE      VOLUME
date                                                         
2010-01-04        12.825  5879617.0       15.0292  10844639.0
2010-01-05        13.020  6928587.0       14.8092  16456228.0
2010-01-06        13.250  5290631.0       14.6834  10446450.0
2010-01-07        13.255  5328586.0       15.0292  31900341.0
2010-01-08        13.470  7160295.0       15.1707  40750768.0

如果我想在每个资产的数据框中添加第三列,那么语法是什么?例如:

df['ACA FP Equity']['PriceVolume'] = df['ACA FP Equity']['LAST PRICE']*3

但是我想为每个股权做,而不是手动添加每个股权。

提前致谢。

2 个答案:

答案 0 :(得分:1)

如果您需要所有LAST PRICE列乘以3,请使用slicers选择它们并重命名列名:

idx = pd.IndexSlice
df1 = df.loc[:, idx[:, 'LAST PRICE']].rename(columns={'LAST PRICE':'PriceVolume'}) * 3
print (df1)
           ACA FP Equity UCG IM Equity
             PriceVolume   PriceVolume
2010-01-04        38.475       45.0876
2010-01-05        39.060       44.4276
2010-01-06        39.750       44.0502
2010-01-07        39.765       45.0876
2010-01-08        40.410       45.5121

然后你需要concat输出:

print (pd.concat([df,df1], axis=1))
           ACA FP Equity            UCG IM Equity             ACA FP Equity  \
              LAST PRICE     VOLUME    LAST PRICE      VOLUME   PriceVolume   
2010-01-04        12.825  5879617.0       15.0292  10844639.0        38.475   
2010-01-05        13.020  6928587.0       14.8092  16456228.0        39.060   
2010-01-06        13.250  5290631.0       14.6834  10446450.0        39.750   
2010-01-07        13.255  5328586.0       15.0292  31900341.0        39.765   
2010-01-08        13.470  7160295.0       15.1707  40750768.0        40.410   

           UCG IM Equity  
             PriceVolume  
2010-01-04       45.0876  
2010-01-05       44.4276  
2010-01-06       44.0502  
2010-01-07       45.0876  
2010-01-08       45.5121  

没有concat的另一个解决方案是从selected_df列创建元组然后分配输出:

idx = pd.IndexSlice
selected_df = df.loc[:, idx[:, 'LAST PRICE']]

new_cols = [(x, 'PriceVolume') for x in selected_df.columns.levels[0]]
print (new_cols)
[('ACA FP Equity', 'PriceVolume'), ('UCG IM Equity', 'PriceVolume')]

df[new_cols] = selected_df * 3
print(df)
           ACA FP Equity            UCG IM Equity             ACA FP Equity  \
              LAST PRICE     VOLUME    LAST PRICE      VOLUME   PriceVolume   
2010-01-04        12.825  5879617.0       15.0292  10844639.0        38.475   
2010-01-05        13.020  6928587.0       14.8092  16456228.0        39.060   
2010-01-06        13.250  5290631.0       14.6834  10446450.0        39.750   
2010-01-07        13.255  5328586.0       15.0292  31900341.0        39.765   
2010-01-08        13.470  7160295.0       15.1707  40750768.0        40.410   

           UCG IM Equity  
             PriceVolume  
2010-01-04       45.0876  
2010-01-05       44.4276  
2010-01-06       44.0502  
2010-01-07       45.0876  
2010-01-08       45.5121  

答案 1 :(得分:1)

我能想到的最优雅的方式是:

import re
rx = r"(?:AppearanceTime\s+|^\d+\s+)(\d{2}:\d{2}:\d{2}|\d{2}\.\d{3})"
s = <<YOUR STRING HERE>>
res = re.findall(rx, s, flags=re.MULTILINE)
print(res)

apply语句允许您为数据帧中指定列的每个值执行给定函数,在本例中为lambda expression,将每个输入乘以3。运行apply语句将返回一个pandas Series,然后可以将其添加为数据框中的列。

这是一个简单的示例,展示了如何使用简单的数据框:

df['ACA FP Equity']['PriceVolume'] = pd.Series(df['ACA FP Equity']['LAST PRICE'].apply(lambda x: x*3))