我的数据框res
具有以下结构:
Field A B
Security date
EFA
2001-08-17 NaN 29.4944
2001-08-20 0.1983 29.5529
2001-08-21 -0.2374 29.4827
2001-08-22 1.2297 29.8453
2001-08-23 -0.4702 29.7049
2001-08-24 1.3622 30.1096
2001-08-27 -0.1787 30.0558
2001-08-28 -1.1440 29.7119
2001-08-29 -0.4566 29.5763
2001-08-30 -1.4235 29.1553
2001-08-31 0.2407 29.2254
2001-09-04 -2.2809 28.5588
2001-09-05 -0.6143 28.3834
2001-09-06 -2.2662 27.7402
2001-09-07 -0.5902 27.5765
2001-09-10 -1.1450 27.2607
2001-09-17 -4.3758 26.0678
2001-09-18 -0.8075 25.8573
2001-09-19 -0.2714 25.7872
2001-09-20 -4.3537 24.6644
2001-09-21 -2.7975 23.9745
2001-09-24 4.6341 25.0855
2001-09-25 1.1655 25.3778
2001-09-26 0.5069 25.5065
2001-09-27 1.5773 25.9088
2001-09-28 1.9500 26.4140
2001-10-01 -0.5402 26.2713
2001-10-02 0.3530 26.3641
2001-10-03 1.0218 26.6334
2001-10-04 1.0642 26.9169
以下索引:
MultiIndex(levels=[[u'EFA', u'IVV', u'SPY'], [2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00, 2001-01-11 00:00:00, 2001-01-12 00:00:00, 2001-01-16 00:00:00, 2001-01-17 00:00:00, 2001-01-18 00:00:00, 2001-01-19 00:00:00, 2001-01-22 00:00:00, 2001-01-23 00:00:00, 2001-01-24 00:00:00, 2001-01-25 00:00:00, 2001-01-26 00:00:00, 2001-01-29 00:00:00, 2001-01-30 00:00:00, 2001-01-31 00:00:00, 2001-02-01 00:00:00, 2001-02-02 00:00:00, 2001-02-05 00:00:00, 2001-02-06 00:00:00, 2001-02-07 00:00:00, 2001-02-08 00:00:00, 2001-02-09 00:00:00, 2001-02-12 00:00:00, 2001-02-13 00:00:00, 2001-02-14 00:00:00, 2001-02-15 00:00:00, 2001-02-16 00:00:00, 2001-02-20 00:00:00, 2001-02-21 00:00:00, 2001-02-22 00:00:00,...]], names=[u'Security', u'date'])
我想过滤A的平均值<0
的位置所以我想尝试以下方法:
f = res.unstack(level=0)['A'].mean()<0
我得到了:
Security
EFA False
IVV False
SPY False
dtype: bool
大!
现在当我试图回过头来过滤res时,无论我尝试过什么,我都会一直收到错误。
似乎slice可能是正确的路线,但我不确定如何正确应用它。
这里的任何输入都会非常有用!
遗憾的是,我对这个响应对象有点束缚。
答案 0 :(得分:0)
我不确定你到底在找什么,但下面有两种选择。第一个可能是IMO(使用groupby / transform)最简单的方式,但第二个可能更接近(我认为)你所要求的。
方法1 创建一个与A的平均值相对应的变量,并使用transform符合您的数据框索引:
>>> res['mean_A'] = res.groupby(level=0)['A'].transform('mean')
A B mean_A
security date
efa 2001-08-20 0.1983 29.5529 -0.07536
2001-08-21 -0.2374 29.4827 -0.07536
2001-08-22 -1.2297 29.8453 -0.07536
2001-08-23 -0.4702 29.7049 -0.07536
2001-08-24 1.3622 30.1096 -0.07536
ivv 2001-08-20 0.1983 29.5529 0.41652
2001-08-21 -0.2374 29.4827 0.41652
2001-08-22 1.2297 29.8453 0.41652
2001-08-23 -0.4702 29.7049 0.41652
2001-08-24 1.3622 30.1096 0.41652
然后标准的布尔索引很容易:
>>> res[ res['mean_A'] < 0 ]
A B mean_A
security date
efa 2001-08-20 0.1983 29.5529 -0.07536
2001-08-21 -0.2374 29.4827 -0.07536
2001-08-22 -1.2297 29.8453 -0.07536
2001-08-23 -0.4702 29.7049 -0.07536
2001-08-24 1.3622 30.1096 -0.07536
方法2 或者,如果您从'f'开始并且需要这样做,您可以像这样接近它(注意我使用的是groupby而不是stack,因为那是一个对我来说更自然的方法,但没关系):
>>> f = (res.groupby(level=0)['A'].mean() < 0)
>>> res[ res.reset_index()['security'].map(f).values ]
A B
security date
efa 2001-08-20 0.1983 29.5529
2001-08-21 -0.2374 29.4827
2001-08-22 -1.2297 29.8453
2001-08-23 -0.4702 29.7049
2001-08-24 1.3622 30.1096