如何使用带有卖价和买价的熊猫数据框计算成交量加权平均价(VWAP)?

时间:2019-04-16 01:56:54

标签: python-3.x pandas numpy dataframe quantitative-finance

如果我的表如下所示,我如何创建另一个称为vwap的列来计算vwap?

             time            bid_size   bid       ask  ask_size trade trade_size phase  
0   2019-01-07 07:45:01.064515  495   152.52    152.54    19     NaN      NaN    OPEN   
1   2019-01-07 07:45:01.110072  31    152.53    152.54    19     NaN      NaN    OPEN   
2   2019-01-07 07:45:01.116596  32    152.53    152.54    19     NaN      NaN    OPEN   
3   2019-01-07 07:45:01.116860  32    152.53    152.54    21     NaN      NaN    OPEN   
4   2019-01-07 07:45:01.116905  34    152.53    152.54    21     NaN      NaN    OPEN   
5   2019-01-07 07:45:01.116982  34    152.53    152.54    31     NaN      NaN    OPEN   
6   2019-01-07 07:45:01.147901  38    152.53    152.54    31     NaN      NaN    OPEN   
7   2019-01-07 07:45:01.189971  38    152.53    152.54    31     ask     15.0    OPEN   
8   2019-01-07 07:45:01.189971  38    152.53    152.54    16     NaN      NaN    OPEN   
9   2019-01-07 07:45:01.190766  37    152.53    152.54    16     NaN      NaN    OPEN   
10  2019-01-07 07:45:01.190856  37    152.53    152.54    15     NaN      NaN    OPEN
11  2019-01-07 07:45:01.190856  37    152.53    152.54    16     ask      1.0    OPEN   
12  2019-01-07 07:45:01.193938  37    152.53    152.55   108     NaN      NaN    OPEN   
13  2019-01-07 07:45:01.193938  37    152.53    152.54    15     ask     15.0    OPEN   
14  2019-01-07 07:45:01.194326  2     152.54    152.55   108     NaN      NaN    OPEN   
15  2019-01-07 07:45:01.194453  2     152.54    152.55    97     NaN      NaN    OPEN   
16  2019-01-07 07:45:01.194479  6     152.54    152.55    97     NaN      NaN    OPEN   
17  2019-01-07 07:45:01.194507  19    152.54    152.55    97     NaN      NaN    OPEN   
18  2019-01-07 07:45:01.194532  19    152.54    152.55    77     NaN      NaN    OPEN   
19  2019-01-07 07:45:01.194598  19    152.54    152.55    79     NaN      NaN    OPEN   

抱歉,该表不清楚,但是第二最右边的列是trade_size,左边是trade,它显示了交易的一侧(买入或卖出)。如果trade_size和trade均为NaN,则表示在该时间戳下没有交易发生。

如果df ['trade'] ==“问”,则交易价格将为'ask'列中的价格;如果df ['trade] ==“出价”,则交易价格将为''列中的价格。出价'。既然有2个价格,请问我该如何计算vwap df ['vwap']?

我的想法是使用np.cumsum()。谢谢!

3 个答案:

答案 0 :(得分:1)

这是一种可能的方法

ToArray() s的enumerators.Select(e => e.Current)列中追加

VMAP

计算NaN(基于this公式provided by the OP)并基于df['VMAP'] = np.nan VMAPas requierd by the OP

分配值
ask

编辑

bid correctly indicated一样,for trade in ['ask','bid']: # Find indexes of `ask` or `buy` bid_idx = df[df.trade==trade].index # Slice DF based on `ask` or `buy`, using indexes df.loc[bid_idx, 'VMAP'] = ( (df.loc[bid_idx, 'trade_size'] * df.loc[bid_idx, trade]).cumsum() / (df.loc[bid_idx, 'trade_size']).cumsum() ) print(df.iloc[:,1:]) time bid_size bid ask ask_size trade trade_size phase VMAP 0 07:45:01.064515 495 152.52 152.54 19 NaN NaN OPEN NaN 1 07:45:01.110072 31 152.53 152.54 19 NaN NaN OPEN NaN 2 07:45:01.116596 32 152.53 152.54 19 NaN NaN OPEN NaN 3 07:45:01.116860 32 152.53 152.54 21 NaN NaN OPEN NaN 4 07:45:01.116905 34 152.53 152.54 21 NaN NaN OPEN NaN 5 07:45:01.116982 34 152.53 152.54 31 NaN NaN OPEN NaN 6 07:45:01.147901 38 152.53 152.54 31 NaN NaN OPEN NaN 7 07:45:01.189971 38 152.53 152.54 31 ask 15.0 OPEN 152.54 8 07:45:01.189971 38 152.53 152.54 16 NaN NaN OPEN NaN 9 07:45:01.190766 37 152.53 152.54 16 NaN NaN OPEN NaN 10 07:45:01.190856 37 152.53 152.54 15 NaN NaN OPEN NaN 11 07:45:01.190856 37 152.53 152.54 16 ask 1.0 OPEN 152.54 12 07:45:01.193938 37 152.53 152.55 108 NaN NaN OPEN NaN 13 07:45:01.193938 37 152.53 152.54 15 ask 15.0 OPEN 152.54 14 07:45:01.194326 2 152.54 152.55 108 NaN NaN OPEN NaN 15 07:45:01.194453 2 152.54 152.55 97 NaN NaN OPEN NaN 16 07:45:01.194479 6 152.54 152.55 97 NaN NaN OPEN NaN 17 07:45:01.194507 19 152.54 152.55 97 NaN NaN OPEN NaN 18 07:45:01.194532 19 152.54 152.55 77 NaN NaN OPEN NaN 19 07:45:01.194598 19 152.54 152.55 79 NaN NaN OPEN NaN @edinho列相同。

答案 1 :(得分:1)

好,就在这里

df['trade_price'] = df.apply(lambda x: x['bid'] if x['trade']=='bid' else x['ask'], axis=1)
df['vwap'] = (df['trade_price'] * df['trade_size']).cumsum() / df['trade_size'].fillna(0).cumsum()

第一行:
它将trade_price保存在新列中,因此以后更容易检索它。
如果需要,您可以删除此行并创建一个函数(也许更容易阅读)。但我更喜欢看中介结果。
问:为什么即使没有交易也有价值?
答:由于lambda的编写方式。 else捕获ask的价格。但这不会有任何变化,因为下一步。

第二行:
在这里进行真正的计算。
第一部分计算到那一刻为止的总交易量(如您所说,使用累计总和可以使工作更轻松)。
第二部分计算到那一刻为止的总交易量(再次是累计金额)。
如果需要,可以中断此行并增加更多中间列。
问:为什么使用fillna(0)
答:因此总交易量不会达到NaNs,而且您不会遇到除法错误 问:为什么NaNs列中有这么多vwap
答:由于没有交易渠道。您可以用0s填充它们,但最好保留“禁止交易”的信息。

Ps .:您可能会得到错误的结果,因为它只考虑相同方向的数量和价格。但是,您可以尝试以某种方式反转一些信号以固定音量(例如:将ask价格更改为负数)。

,此代码输出:

    trade_price vwap
1   152.54  NaN
2   152.54  NaN
3   152.54  NaN
4   152.54  NaN
5   152.54  NaN
6   152.54  NaN
7   152.54  NaN
8   152.54  152.54
9   152.54  NaN
10  152.54  NaN
11  152.54  NaN
12  152.54  152.54
13  152.55  NaN
14  152.54  152.54
15  152.55  NaN
16  152.55  NaN
17  152.55  NaN
18  152.55  NaN
19  152.55  NaN
20  152.55  NaN

答案 2 :(得分:1)

您可以根据bid列中的值,使用np.where从正确的列(asktrade)中为您提供价格。请注意,这将在没有交易发生时为您提供买入价,但是因为随后将其乘以NaN交易量就没有关系了。我也提前填写了VWAP。

volume = df['trade_size']
price = np.where(df['trade'].eq('ask'), df['ask'], df['bid'])  
df = df.assign(VWAP=((volume * price).cumsum() / vol.cumsum()).ffill())

>>> df
        time    bid_size    bid ask ask_size    trade   trade_size  phase   VWAP
0   2019-01-07  07:45:01.064515 495 152.52  152.54  19  NaN NaN OPEN    NaN
1   2019-01-07  07:45:01.110072 31  152.53  152.54  19  NaN NaN OPEN    NaN
2   2019-01-07  07:45:01.116596 32  152.53  152.54  19  NaN NaN OPEN    NaN
3   2019-01-07  07:45:01.116860 32  152.53  152.54  21  NaN NaN OPEN    NaN
4   2019-01-07  07:45:01.116905 34  152.53  152.54  21  NaN NaN OPEN    NaN
5   2019-01-07  07:45:01.116982 34  152.53  152.54  31  NaN NaN OPEN    NaN
6   2019-01-07  07:45:01.147901 38  152.53  152.54  31  NaN NaN OPEN    NaN
7   2019-01-07  07:45:01.189971 38  152.53  152.54  31  ask 15.0    OPEN    152.54
8   2019-01-07  07:45:01.189971 38  152.53  152.54  16  NaN NaN OPEN    152.54
9   2019-01-07  07:45:01.190766 37  152.53  152.54  16  NaN NaN OPEN    152.54
10  2019-01-07  07:45:01.190856 37  152.53  152.54  15  NaN NaN OPEN    152.54
11  2019-01-07  07:45:01.190856 37  152.53  152.54  16  ask 1.0 OPEN    152.54
12  2019-01-07  07:45:01.193938 37  152.53  152.55  108 NaN NaN OPEN    152.54
13  2019-01-07  07:45:01.193938 37  152.53  152.54  15  ask 15.0    OPEN    152.54
14  2019-01-07  07:45:01.194326 2   152.54  152.55  108 NaN NaN OPEN    152.54
15  2019-01-07  07:45:01.194453 2   152.54  152.55  97  NaN NaN OPEN    152.54
16  2019-01-07  07:45:01.194479 6   152.54  152.55  97  NaN NaN OPEN    152.54
17  2019-01-07  07:45:01.194507 19  152.54  152.55  97  NaN NaN OPEN    152.54
18  2019-01-07  07:45:01.194532 19  152.54  152.55  77  NaN NaN OPEN    152.54
19  2019-01-07  07:45:01.194598 19  152.54  152.55  79  NaN NaN OPEN    152.54