在pandas中的groupby之后过滤数据帧

时间:2015-06-16 01:33:04

标签: python pandas group-by

我有以下数据框:

In [4]:

df
Out[4]:
Symbol       Date      Strike     C/P  Bid  Ask  
0      GS  6/15/2015     200        c    5   72   
1      GS  6/15/2015     200        p    5   72    
2      GS  6/15/2015     210        c   15    0     
3      GS  6/15/2015     210        p   15   54     
4      GS  7/15/2015     200        c   20   50     
5      GS  7/15/2015     200        p   20    0     
6      GS  7/15/2015     210        c    4   90     
7      GS  7/15/2015     210        p    4   90   
8     IBM  6/15/2015     150        c   12   27    
9     IBM  6/15/2015     150        p   12    0    
10    IBM  6/15/2015     160        c    1   58    
11    IBM  6/15/2015     160        p    1    3     
12    IBM  7/15/2015     120        c   13   39     
13    IBM  7/15/2015     120        p   13   39     
14    IBM  7/15/2015     130        c    4   45     
15    IBM  7/15/2015     130        p    4   45    

如果其中任何一个的问题值为0,则希望过滤掉给定警示的c和p:

Symbol  Date     Strike Call/Put    Bid    Ask  yminx
  GS    6/15/2015   200     c          5    72  90
  GS    6/15/2015   200     p          5    72  90
  GS    7/15/2015   210     c          4    90  90
  GS    7/15/2015   210     p          4    90  90
  IBM   6/15/2015   160     c          1    58  58
  IBM   6/15/2015   160     p          1    3   58
  IBM   7/15/2015   120     c         13    39  58
  IBM   7/15/2015   120     p         13    39  58
  IBM   7/15/2015   130     c          4    45  58
  IBM   7/15/2015   130     p          4    45  58

我可以通过询问为0进行过滤,并通过执行以下操作删除该行:

df = df[df.Ask != 0]

但我无法弄清楚如何删除具有相同符号/日期/警示组合但非零问题的另一行。

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:2)

>>> mask = df.groupby(['Symbol', 'Date', 'Strike'])['Ask'].transform('all') 
>>> df[~mask]
  Symbol       Date  Strike C/P  Bid  Ask
2     GS  6/15/2015     210   c   15    0
3     GS  6/15/2015     210   p   15   54
4     GS  7/15/2015     200   c   20   50
5     GS  7/15/2015     200   p   20    0
8    IBM  6/15/2015     150   c   12   27
9    IBM  6/15/2015     150   p   12    0

所以要删除这些行,请df[mask]

答案 1 :(得分:1)

要过滤掉某些行,我们需要使用'过滤器'功能而不是' apply'。

by = df.groupby(['Symbol', 'Date', 'Strike'])

# this is used as filter function, returns a boolean type selector.
# pandas.groupby.filter() function would be smart enough to keep all those 
# entry with True
def equal_to_45(group):
    # return True if either Call or Put has an Ask = 45
    return any(group.Ask.values == 45)

def keep_geq_45(group):
    # return True if both Call or Put have an Ask great or equal to 45
    # that is equivalent to delete all entries with Ask less than 45
    return all(group.Ask.values >= 45)

# this time, use filter function instead of apply
by.filter(equal_to_45)

Out[242]: 
   Symbol        Date  Strike C/P  Bid  Ask
14    IBM  2015-07-15     130   c    4   45
15    IBM  2015-07-15     130   p    4   45

by.filter(keep_geq_45)

Out[243]: 
   Symbol        Date  Strike C/P  Bid  Ask
0      GS  2015-06-15     200   c    5   72
1      GS  2015-06-15     200   p    5   72
6      GS  2015-07-15     210   c    4   90
7      GS  2015-07-15     210   p    4   90
14    IBM  2015-07-15     130   c    4   45
15    IBM  2015-07-15     130   p    4   45