Question

假设我有一个数据框，如：

ticker        MS                AAPL          
field      price    volume     price    volume
0      -0.861210 -0.319607 -0.855145  0.635594
1      -1.986693 -0.526885 -1.765813  1.696533
2      -0.154544 -1.152361 -1.391477 -2.016119
3       0.621641 -0.109499  0.143788 -0.050672

由以下代码生成，请忽略仅作为示例的数字

columns = pd.MultiIndex.from_tuples([('MS', 'price'), ('MS', 'volume'), ('AAPL', 'price'), ('AAPL', 'volume')], names=['ticker', 'field'])
data = np.random.randn(4, 4)
df = pd.DataFrame(data, columns=columns)

现在，我想计算pct_change（）或用户在每个价格列上定义的任何函数，并在“字段”级别添加一个新列以存储结果。

我知道如果数据是面板（从版本0.20开始不推荐使用），该如何优雅地进行操作。假设面板的3轴是日期，代码和字段：

p[:,:, 'ret'] = p[:,:,'price'].pct_change()

仅此而已。但是我还没有找到类似的优雅方法来处理多索引数据框。

Answer 1

您可以使用df.loc[:,pd.IndexSlice[:,'price']].apply(pd.Series.pct_change).rename(columns={'price':'ret'}) Out[1181]: ticker MS AAPL field ret ret 0 NaN NaN 1 -1.420166 -0.279805 2 3.011155 0.062529 3 -1.609004 0.759954

https://www.example.com

Answer 2

def cstm(s):
  return s.pct_change()

new = pd.concat(
    [df.xs('price', 1, 1).apply(cstm)],
    axis=1, keys=['new']
).swaplevel(0, 1, 1)

df.join(new).sort_index(1)

ticker      AAPL                            MS                    
field        new     price    volume       new     price    volume
0            NaN -0.855145  0.635594       NaN -0.861210 -0.319607
1       1.064928 -1.765813  1.696533  1.306863 -1.986693 -0.526885
2      -0.211991 -1.391477 -2.016119 -0.922211 -0.154544 -1.152361
3      -1.103335  0.143788 -0.050672 -5.022430  0.621641 -0.109499

或

def cstm(s):
  return s.pct_change()

df.stack(0).assign(
    new=lambda d: d.groupby('ticker').price.apply(cstm)
).unstack().swaplevel(0, 1, 1).sort_index(1)

如何像在面板上一样优雅地将功能应用于多索引熊猫数据框？

2 个答案: