我有一个如下数据框:
> df.head()
channel sym quant value when
0 online FTR items 0.000515 before
1 video FTR items 0.000329 after
2 online PAC items 1.839338 before
3 video PAC items 2.355360 after
4 online EPM items 0.000947 before
5 test EPM items 0.000774 after
6 online CLC deals 0.000681 before
7 test CLC deals 0.000808 after
8 video CLC deals 0.000808 after
9 online CPC deals 1.620517 before
我希望了解before
,after
和channel
的每个独特组合的sym
到quant
的价值差异。我怎么能在熊猫中做到这一点?
我尝试过:
def my_func(x):
after_value = x.ix[x['when']=='after','value']
before_value = x.ix[x['when']=='before','value']
return after_value - before_value
df.groupby(['channel', 'sym', 'quant']).apply(my_func)
但我得到了:
channel sym quant
online CLC deals 6 NaN
8 NaN
CPC deals 10 NaN
EPM items 4 NaN
FTR items 0 NaN
PAC items 2 NaN
test CLC deals 7 NaN
EPM items 5 NaN
video CLC deals 9 NaN
FTR items 1 NaN
PAC items 3 NaN
这不能给我想要的结果。
答案 0 :(得分:1)
您的意思是按['sym', 'quant']
分组吗?如果是,那么如果您更改when
等于before
的值的符号:
In [199]: df['value'] *= np.where(df['when'] == 'before', -1, 1)
In [200]: df
Out[200]:
channel sym quant value when
0 online FTR items -0.000515 before
1 video FTR items 0.000329 after
2 online PAC items -1.839338 before
3 video PAC items 2.355360 after
4 online EPM items -0.000947 before
5 test EPM items 0.000774 after
6 online CLC deals -0.000681 before
7 test CLC deals 0.000808 after
8 video CLC deals 0.000808 after
9 online CPC deals -1.620517 before
然后你可以通过总结来找到差异:
In [202]: df.groupby(['sym', 'quant'])['value'].agg('sum')
Out[202]:
sym quant
CLC deals 0.000935
CPC deals -1.620517
EPM items -0.000173
FTR items -0.000186
PAC items 0.516022
Name: value, dtype: float64
请注意,每个组(包含相同的sym
和quant
)只包含一个before
和after
行;否则总和可能不是你想要的。例如,如果有一个after
行,但没有before
,则总和将等于after
值,就好像before
值为0.实际上,这是如果您按channel
,sym
和quant
进行分组会发生什么情况,因为每个组只包含一行:
In [201]: df.groupby(['channel', 'sym', 'quant'])['value'].agg('sum')
Out[201]:
channel sym quant
online CLC deals -0.000681
CPC deals -1.620517
EPM items -0.000947
FTR items -0.000515
PAC items -1.839338
test CLC deals 0.000808
EPM items 0.000774
video CLC deals 0.000808
FTR items 0.000329
PAC items 2.355360
Name: value, dtype: float64