条件之间的行之间的组计算

时间:2014-11-25 01:26:16

标签: python pandas

我有一个如下数据框:

> df.head()

    channel   sym   quant      value    when
0   online    FTR   items   0.000515  before
1   video     FTR   items   0.000329   after
2   online    PAC   items   1.839338  before
3   video     PAC   items   2.355360   after
4   online    EPM   items   0.000947  before
5   test      EPM   items   0.000774   after
6   online    CLC   deals   0.000681  before
7   test      CLC   deals   0.000808   after
8   video     CLC   deals   0.000808   after
9   online    CPC   deals   1.620517  before

我希望了解beforeafterchannel的每个独特组合的symquant的价值差异。我怎么能在熊猫中做到这一点?

我尝试过:

def my_func(x):
  after_value  = x.ix[x['when']=='after','value']
  before_value = x.ix[x['when']=='before','value']
  return  after_value - before_value

df.groupby(['channel', 'sym', 'quant']).apply(my_func) 

但我得到了:

channel  sym  quant    
online   CLC  deals  6    NaN
                     8    NaN
         CPC  deals  10   NaN
         EPM  items  4    NaN
         FTR  items  0    NaN
         PAC  items  2    NaN
test     CLC  deals  7    NaN
         EPM  items  5    NaN
video    CLC  deals  9    NaN
         FTR  items  1    NaN
         PAC  items  3    NaN

这不能给我想要的结果。

1 个答案:

答案 0 :(得分:1)

您的意思是按['sym', 'quant']分组吗?如果是,那么如果您更改when等于before的值的符号:

In [199]: df['value'] *= np.where(df['when'] == 'before', -1, 1)

In [200]: df
Out[200]: 
  channel  sym  quant     value    when
0  online  FTR  items -0.000515  before
1   video  FTR  items  0.000329   after
2  online  PAC  items -1.839338  before
3   video  PAC  items  2.355360   after
4  online  EPM  items -0.000947  before
5    test  EPM  items  0.000774   after
6  online  CLC  deals -0.000681  before
7    test  CLC  deals  0.000808   after
8   video  CLC  deals  0.000808   after
9  online  CPC  deals -1.620517  before

然后你可以通过总结来找到差异:

In [202]: df.groupby(['sym', 'quant'])['value'].agg('sum')
Out[202]: 
sym  quant
CLC  deals    0.000935
CPC  deals   -1.620517
EPM  items   -0.000173
FTR  items   -0.000186
PAC  items    0.516022
Name: value, dtype: float64

请注意,每个组(包含相同的symquant)只包含一个beforeafter行;否则总和可能不是你想要的。例如,如果有一个after行,但没有before,则总和将等于after值,就好像before值为0.实际上,这是如果您按channelsymquant进行分组会发生什么情况,因为每个组只包含一行:

In [201]: df.groupby(['channel', 'sym', 'quant'])['value'].agg('sum')
Out[201]: 
channel  sym  quant
online   CLC  deals   -0.000681
         CPC  deals   -1.620517
         EPM  items   -0.000947
         FTR  items   -0.000515
         PAC  items   -1.839338
test     CLC  deals    0.000808
         EPM  items    0.000774
video    CLC  deals    0.000808
         FTR  items    0.000329
         PAC  items    2.355360
Name: value, dtype: float64