基于pandas数据帧中列比较的条件累积和

时间:2014-07-08 21:54:24

标签: python pandas finance

我对熊猫比较陌生,我确信有一个简单的解决方案,但我无法自己弄清楚。我有一个如下所示的交易数据框:

OrderId Size    Price           Side    TimeSecO    TimeUSecO   TimeSecOT   TimeUSecOT AmountBuy AmountSell
10      100     41.44000000     BUY     1403200077  47720       1403200100  640070
11      100     41.43000000     BUY     1403200077  47979       1403200112  43383
12      100     41.45000000     SELL    1403200077  48311       1403200090  61100
14      100     41.45000000     BUY     1403200092  253793      1403200092  374767
17      100     41.44000000     SELL    1403200103  24382       1403200125  929563
20      100     41.43000000     SELL    1403200116  208057      1403200116  226762
31      100     41.46000000     SELL    1403200214  874124      1403200259  751002
37      100     41.46000000     BUY     1403200278  494827      1403200300  729545
42      100     41.45000000     BUY     1403200335  601039      1403200361  925384
42      100     41.45000000     BUY     1403200335  601039      1403200361  925415
45      500     15.54000000     SELL    1403200365  997248      1403200741  26216
49      100     41.45000000     SELL    1403200375  419253      1403200402  959968
53      100     42.61000000     SELL    1403200377  403525      1403200377  403680
54      100     42.61000000     BUY     1403200377  501636      1403200377  501770

我想计算每个OrderId的滚动累积总和,并将它们放入关于Side列,CumAmountBuy和CumAmountSell的2个新列中,其中TimeSecO> TimeSecOT。

例如,对于上述数据帧,OrderId 10,OrderId 11和OrderId 12的正确累积和将是CumAmountBuy = 0和CumAmountSell = 0,因为数据帧中没有记录1403200077> TimeUSecOT。

对于OrderId 14,CumAmountBuy = 0和CumAmountSell = 100,因为此时OrderId 12已经发生,并且它是Side = SELL,并且它满足TimeSecO>的要求。 TimeSecOT(1403200092> 1403200090)。

1 个答案:

答案 0 :(得分:1)

我可以想到一个肮脏的伎俩,但是当数据框变得庞大时,我觉得它并不高效。

In [42]: df['flag'] = df.TimeSecO.map(lambda sec: (sec > df.TimeSecOT).values)

In [43]: df['CumAmountBuy'] = df.flag.map(lambda f: np.dot(f,df['Size']*(df['Side']=='BUY')))

In [44]: df['CumAmountSell'] = df.flag.map(lambda f: np.dot(f,df['Size']*(df['Side']=='SELL')))

In [45]: df[['CumAmountBuy','CumAmountSell']]
Out[45]: 
         CumAmountBuy  CumAmountSell
OrderId                             
10                  0              0
11                  0              0
12                  0              0
14                  0            100
17                200            100
20                300            100
31                300            300
37                300            400
42                400            400
42                400            400
45                600            400
49                600            400
53                600            400
54                600            400