我有一个数据框,当前正在使用python进行大部分计算,但是考虑到上亿行,SQL会更快。我下面的代码按到期日和到期日分组看跌和看涨期权。我们看每个组的最高中间价格并取差价。之后,我们寻找最小的差异。 Python代码如下所示:
df['Price'] = (df['Bid'].values + df['Ask'].values) / 2
df['Maturity'] = (df['Expiration'] - df['DataDate']).dt.days / 365
df输出:
UnderlyingSymbol UnderlyingPrice Type Expiration DataDate Strike \
686098 SPY 289.839996 call 2018-09-04 2018-09-04 150.0
686100 SPY 289.839996 call 2018-09-04 2018-09-04 155.0
686102 SPY 289.839996 call 2018-09-04 2018-09-04 160.0
686104 SPY 289.839996 call 2018-09-04 2018-09-04 165.0
686106 SPY 289.839996 call 2018-09-04 2018-09-04 170.0
... ... ... ... ... ...
691381 SPY 289.839996 put 2020-12-18 2018-09-04 400.0
691382 SPY 289.839996 call 2020-12-18 2018-09-04 405.0
691383 SPY 289.839996 put 2020-12-18 2018-09-04 405.0
691384 SPY 289.839996 call 2020-12-18 2018-09-04 410.0
691385 SPY 289.839996 put 2020-12-18 2018-09-04 410.0
Last Bid Ask Price Maturity
686098 136.710007 139.860001 140.119995 139.989990 0.000000
686100 132.520004 134.850006 135.119995 134.985001 0.000000
686102 127.519997 129.860001 130.119995 129.989990 0.000000
686104 120.349998 124.779999 125.220001 125.000000 0.000000
686106 115.389999 119.779999 120.220001 120.000000 0.000000
... ... ... ... ...
691381 128.729996 110.260002 111.660004 110.960007 2.290411
691382 0.850000 0.740000 0.900000 0.820000 2.290411
691383 134.089996 115.239998 116.190002 115.714996 2.290411
691384 0.690000 0.640000 0.800000 0.720000 2.290411
691385 128.550003 120.230003 121.639999 120.934998 2.290411
在此之后,我们根据到期日和行使价进行分组,并查看每组的最高中间价格。
c = df[df.Type == 'call'].groupby(['Expiration','Strike'])['Price'].first()
p = df[df.Type == 'put'].groupby(['Expiration','Strike'])['Price'].first()
df = df.join((c - p).rename('CP_diff'), on=['Expiration','Strike'])
df = df[~df.CP_diff.isna()]
df['Forward'] = df['CP_diff'].values + df['Strike']
c输出:
Expiration Strike
2018-09-04 150.0 139.989990
155.0 134.985001
160.0 129.989990
165.0 125.000000
170.0 120.000000
2020-12-18 390.0 1.290000
395.0 1.095000
400.0 0.965000
405.0 0.820000
410.0 0.720000
此后,我们将每个组的价格差异最小化,并相应地更新数据框
minimum_difference = df.loc[df.groupby("Expiration")["CP_diff"].idxmin().values]
minimum_difference = minimum_difference[['Forward', 'Expiration']].set_index("Expiration")
df = df.set_index("Expiration")
df.update(minimum_difference)
最小差异输出:
Forward
Expiration
2018-09-04 289.975006
2018-09-05 289.980011
2018-09-07 289.989990
2018-09-10 289.984985
2018-09-12 289.984985
2018-09-14 289.984985
2018-09-17 289.984985
最后是df输出:
UnderlyingSymbol UnderlyingPrice Type DataDate Strike \
Expiration
2018-09-04 SPY 289.839996 call 2018-09-04 290.0
2018-09-04 SPY 289.839996 put 2018-09-04 290.0
2018-09-05 SPY 289.839996 call 2018-09-04 270.0
2018-09-05 SPY 289.839996 put 2018-09-04 270.0
2018-09-05 SPY 289.839996 call 2018-09-04 270.5
... ... ... ... ...
2020-12-18 SPY 289.839996 put 2018-09-04 400.0
2020-12-18 SPY 289.839996 call 2018-09-04 405.0
2020-12-18 SPY 289.839996 put 2018-09-04 405.0
2020-12-18 SPY 289.839996 call 2018-09-04 410.0
2020-12-18 SPY 289.839996 put 2018-09-04 410.0
Last Bid Ask Price Maturity \
Expiration
2018-09-04 0.040000 0.030000 0.040000 0.035000 0.000000
2018-09-04 0.050000 0.050000 0.070000 0.060000 0.000000
2018-09-05 0.000000 19.910000 20.080000 19.994999 0.002740
2018-09-05 0.010000 0.010000 0.020000 0.015000 0.002740
2018-09-05 19.090000 19.410000 19.580000 19.494999 0.002740
... ... ... ... ...
2020-12-18 128.729996 110.260002 111.660004 110.960007 2.290411
2020-12-18 0.850000 0.740000 0.900000 0.820000 2.290411
2020-12-18 134.089996 115.239998 116.190002 115.714996 2.290411
2020-12-18 0.690000 0.640000 0.800000 0.720000 2.290411
2020-12-18 128.550003 120.230003 121.639999 120.934998 2.290411
Forward
Expiration
2018-09-04 289.975006
2018-09-04 289.975006
2018-09-05 289.980011
2018-09-05 289.980011
2018-09-05 289.980011
如何使用SQL实现相同的目标?我的尝试如下:
WITH summary AS (
SELECT df.datadate,df.expiration,df.type,(df.ask+df.bid)/2 as mid,
df.type,
ROW_NUMBER() OVER(PARTITION BY df.expiration, df.type, df.datadate
ORDER BY (df.ask+od.bid)/2 DESC) AS rk
FROM option_data df
where od.Underlyingsymbol = 'SPY')
SELECT s.*
FROM summary s
WHERE s.rk = 1
我是对的,可以通过上述查询计算出python变量p和c吗?这将是我的第一个更复杂的SQL查询,我正在尝试获取一些可以建立的示例。