在多索引熊猫数据框的表达式中使用其他变量创建新变量

时间:2019-05-11 08:45:08

标签: python-3.x pandas multi-index assign

我有以下多索引熊猫数据框:

toy.to_json()
'{"["ISRG","Price"]":{"2004-12-31":10.35,"2005-01-28":10.35,"2005-03-31":14.15,"2005-04-01":14.15,"2005-04-29":14.15,"2005-06-30":15.51,"2005-07-01":15.51,"2005-07-29":15.51,"2005-09-30":20.77,"2005-10-28":20.77},"["ISRG","Price_high"]":{"2004-12-31":13.34,"2005-01-28":13.34,"2005-03-31":16.27,"2005-04-01":16.27,"2005-04-29":16.27,"2005-06-30":17.35,"2005-07-01":17.35,"2005-07-29":17.35,"2005-09-30":25.96,"2005-10-28":25.96},"["ISRG","Price_low"]":{"2004-12-31":7.36,"2005-01-28":7.36,"2005-03-31":12.03,"2005-04-01":12.03,"2005-04-29":12.03,"2005-06-30":13.67,"2005-07-01":13.67,"2005-07-29":13.67,"2005-09-30":15.58,"2005-10-28":15.58},"["EW","Price"]":{"2004-12-31":9.36,"2005-01-28":9.36,"2005-03-31":10.47,"2005-04-01":10.47,"2005-04-29":10.47,"2005-06-30":11.07,"2005-07-01":11.07,"2005-07-29":11.07,"2005-09-30":10.86,"2005-10-28":10.86},"["EW","Price_high"]":{"2004-12-31":10.56,"2005-01-28":10.56,"2005-03-31":11.07,"2005-04-01":11.07,"2005-04-29":11.07,"2005-06-30":11.69,"2005-07-01":11.69,"2005-07-29":11.69,"2005-09-30":11.56,"2005-10-28":11.56},"["EW","Price_low"]":{"2004-12-31":8.15,"2005-01-28":8.15,"2005-03-31":9.87,"2005-04-01":9.87,"2005-04-29":9.87,"2005-06-30":10.46,"2005-07-01":10.46,"2005-07-29":10.46,"2005-09-30":10.16,"2005-10-28":10.16},"["volatility",""]":{"2004-12-31":null,"2005-01-28":null,"2005-03-31":null,"2005-04-01":null,"2005-04-29":null,"2005-06-30":null,"2005-07-01":null,"2005-07-29":null,"2005-09-30":null,"2005-10-28":null}}'

enter image description here

我想用一行代码在第二级(即在“ ISGR”和“ EW”下)创建一个称为“波动率”的新列,该列将由以下表达式定义:

(100 * (Price_high - Price_low)/Price).round()

我有两个问题: a)我无法创建新列 b)我无法分配

这是我用来创建列的代码,但失败了:

idx = pd.IndexSlice

100 *( toy.loc[:, idx[:, 'Price_high']]  - toy.loc[:, idx[:, 'Price_low']].div(toy.loc[:, idx[:, 'Price']])).round()

此代码行返回NaN:

enter image description here

1 个答案:

答案 0 :(得分:0)

对于输出MultiIndex DataFrame,必须与所选DataFrame中的MultiIndex相同,因此请使用rename

idx = pd.IndexSlice

Price_high =  toy.loc[:, idx[:, 'Price_high']].rename(columns={'Price_high':'new'})
Price_low = toy.loc[:, idx[:, 'Price_low']].rename(columns={'Price_low':'new'})
Price = toy.loc[:, idx[:, 'Price']].rename(columns={'Price':'new'})
df4 = (100 * (Price_high - Price_low)/Price).round()
print (df4)
            ISRG    EW
             new   new
2004-12-31  58.0  26.0
2005-01-28  58.0  26.0
2005-03-31  30.0  11.0
2005-04-01  30.0  11.0
2005-04-29  30.0  11.0
2005-06-30  24.0  11.0
2005-07-01  24.0  11.0
2005-07-29  24.0  11.0
2005-09-30  50.0  13.0
2005-10-28  50.0  13.0

另一种方法是使用DataFrame.xs来避免进入第二级,因此不使用MultiIndex DataFrames

Price_high =  toy.xs('Price_high', axis=1, level=1)
Price_low = toy.xs('Price_low', axis=1, level=1)
Price = toy.xs('Price', axis=1, level=1)
df4 = (100 * (Price_high - Price_low)/Price).round()
print (df4)
            ISRG    EW
2004-12-31  58.0  26.0
2005-01-28  58.0  26.0
2005-03-31  30.0  11.0
2005-04-01  30.0  11.0
2005-04-29  30.0  11.0
2005-06-30  24.0  11.0
2005-07-01  24.0  11.0
2005-07-29  24.0  11.0
2005-09-30  50.0  13.0
2005-10-28  50.0  13.0

然后根据需要MultiIndex添加MultiIndex.from_product

df4.columns = pd.MultiIndex.from_product([df4.columns, ['new']])
print (df4)
            ISRG    EW
             new   new
2004-12-31  58.0  26.0
2005-01-28  58.0  26.0
2005-03-31  30.0  11.0
2005-04-01  30.0  11.0
2005-04-29  30.0  11.0
2005-06-30  24.0  11.0
2005-07-01  24.0  11.0
2005-07-29  24.0  11.0
2005-09-30  50.0  13.0
2005-10-28  50.0  13.0