将条件列添加到熊猫的多索引数据框中

时间:2017-05-12 14:28:50

标签: pandas dataframe conditional multi-index

我有以下多索引数据框,我正在努力为数据帧添加一个条件列。我当前的代码生成错误:

ValueError: Wrong number of items passed 4, placement implies 1 

数据框如下:

       ed12 comdty              xau curncy             
           PX_LAST MOV_AVG_200D    PX_LAST MOV_AVG_200D
date                                                       
1997-10-06       93.75      93.2863     332.55       339.45
1997-10-07       93.78      93.2881     331.45       339.27
1997-10-08       93.65      93.2892     333.25       339.09
1997-10-09       93.64      93.2904     327.75       338.90
1997-10-10       93.59      93.2913     329.65       338.74

我正在尝试为每个名为“BREADTH”的ed12 comdty和xau curncy索引添加第三列,这取决于该索引的PX_LAST列是否为> = MOV_AVG_200D。

以下代码:

for ticker in data.columns.levels[0]:

data[(ticker,'BREADTH')] = data.where(data[(ticker,'PX_LAST')]>=data[(ticker,'MOV_AVG_200D')],1,0)

谢谢!

2 个答案:

答案 0 :(得分:0)

最简单的是通过astype将布尔掩码转换为int

import pandas as pd

for ticker in data.columns.levels[0]:
    mask = data[(ticker,'PX_LAST')]>=data[(ticker,'MOV_AVG_200D')]
    data[(ticker,'BREADTH')] = mask.astype(int)

data = data.sort_index(axis=1,ascending=[True, False])
print (data)
           ed12 comdty                      xau curncy                     
               PX_LAST MOV_AVG_200D BREADTH    PX_LAST MOV_AVG_200D BREADTH
date                                                                       
1997-10-06       93.75      93.2863       1     332.55       339.45       0
1997-10-07       93.78      93.2881       1     331.45       339.27       0
1997-10-08       93.65      93.2892       1     333.25       339.09       0
1997-10-09       93.64      93.2904       1     327.75       338.90       0
1997-10-10       93.59      93.2913       1     329.65       338.74       0

或者使用stack进行重塑,添加含有astype booelan蒙版的列,然后使用unstack + swaplevel sort_index重新整形:

data = data.stack(level=0)
data['BREADTH'] = (data['PX_LAST'] >= data['MOV_AVG_200D']).astype(int)
data = data.unstack().swaplevel(0,1,axis=1).sort_index(axis=1, ascending=[True, False])
print (data)
           ed12 comdty                      xau curncy                     
               PX_LAST MOV_AVG_200D BREADTH    PX_LAST MOV_AVG_200D BREADTH
date                                                                       
1997-10-06       93.75      93.2863       1     332.55       339.45       0
1997-10-07       93.78      93.2881       1     331.45       339.27       0
1997-10-08       93.65      93.2892       1     333.25       339.09       0
1997-10-09       93.64      93.2904       1     327.75       338.90       0
1997-10-10       93.59      93.2913       1     329.65       338.74       0

答案 1 :(得分:0)

如果您只有这4列,您只需插入2个BREADTH列,如下所示:

df.insert(2,('ed12 comdty','BREADTH'),(df.iloc[:,0] > df.iloc[:,1]).astype(int))

df.insert(len(df.columns),('xau curncy','BREADTH'),(df.iloc[:,-2] > df.iloc[:,-1]).astype(int))

df
Out[1495]: 
           ed12 comdty                      xau curncy                     
               PX_LAST MOV_AVG_200D BREADTH    PX_LAST MOV_AVG_200D BREADTH
Date                                                                       
1997-10-06       93.75      93.2863       1     332.55       339.45       0
1997-10-07       93.78      93.2881       1     331.45       339.27       0
1997-10-08       93.65      93.2892       1     333.25       339.09       0
1997-10-09       93.64      93.2904       1     327.75       338.90       0
1997-10-10       93.59      93.2913       1     329.65       338.74       0