我对熊猫比较陌生,我确信有一个简单的解决方案,但我无法自己弄清楚。我有一个如下所示的交易数据框:
OrderId Size Price Side TimeSecO TimeUSecO TimeSecOT TimeUSecOT AmountBuy AmountSell
10 100 41.44000000 BUY 1403200077 47720 1403200100 640070
11 100 41.43000000 BUY 1403200077 47979 1403200112 43383
12 100 41.45000000 SELL 1403200077 48311 1403200090 61100
14 100 41.45000000 BUY 1403200092 253793 1403200092 374767
17 100 41.44000000 SELL 1403200103 24382 1403200125 929563
20 100 41.43000000 SELL 1403200116 208057 1403200116 226762
31 100 41.46000000 SELL 1403200214 874124 1403200259 751002
37 100 41.46000000 BUY 1403200278 494827 1403200300 729545
42 100 41.45000000 BUY 1403200335 601039 1403200361 925384
42 100 41.45000000 BUY 1403200335 601039 1403200361 925415
45 500 15.54000000 SELL 1403200365 997248 1403200741 26216
49 100 41.45000000 SELL 1403200375 419253 1403200402 959968
53 100 42.61000000 SELL 1403200377 403525 1403200377 403680
54 100 42.61000000 BUY 1403200377 501636 1403200377 501770
我想计算每个OrderId的滚动累积总和,并将它们放入关于Side列,CumAmountBuy和CumAmountSell的2个新列中,其中TimeSecO> TimeSecOT。
例如,对于上述数据帧,OrderId 10,OrderId 11和OrderId 12的正确累积和将是CumAmountBuy = 0和CumAmountSell = 0,因为数据帧中没有记录1403200077> TimeUSecOT。
对于OrderId 14,CumAmountBuy = 0和CumAmountSell = 100,因为此时OrderId 12已经发生,并且它是Side = SELL,并且它满足TimeSecO>的要求。 TimeSecOT(1403200092> 1403200090)。
答案 0 :(得分:1)
我可以想到一个肮脏的伎俩,但是当数据框变得庞大时,我觉得它并不高效。
In [42]: df['flag'] = df.TimeSecO.map(lambda sec: (sec > df.TimeSecOT).values)
In [43]: df['CumAmountBuy'] = df.flag.map(lambda f: np.dot(f,df['Size']*(df['Side']=='BUY')))
In [44]: df['CumAmountSell'] = df.flag.map(lambda f: np.dot(f,df['Size']*(df['Side']=='SELL')))
In [45]: df[['CumAmountBuy','CumAmountSell']]
Out[45]:
CumAmountBuy CumAmountSell
OrderId
10 0 0
11 0 0
12 0 0
14 0 100
17 200 100
20 300 100
31 300 300
37 300 400
42 400 400
42 400 400
45 600 400
49 600 400
53 600 400
54 600 400