我有这个Bigquery数据框,其中long_entry或short_entry中的1代表当时对应的多头/空头头寸进入交易。 long_exit或short_exit中的1表示退出交易。我想添加2个新列,一列称为long_pnl,它列出由各个多头交易产生的PnL,另一列称为short_pnl,它对从各个空头交易中产生的PnL进行列表。
此回测在任何时间点最多只能进行1个交易/头寸。
下面是我的数据框。如我们所见,在26/2/2019输入多头交易并在1/3/2019关闭,Pnl将为$ 64.45,而在4/3/2019输入空头交易并在2019/5/3关闭PNL为-119.11美元(亏损)。
date price long_entry long_exit short_entry short_exit
0 24/2/2019 4124.25 0 0 0 0
1 25/2/2019 4130.67 0 0 0 0
2 26/2/2019 4145.67 1 0 0 0
3 27/2/2019 4180.10 0 0 0 0
4 28/2/2019 4200.05 0 0 0 0
5 1/3/2019 4210.12 0 1 0 0
6 2/3/2019 4198.10 0 0 0 0
7 3/3/2019 4210.34 0 0 0 0
8 4/3/2019 4100.12 0 0 1 0
9 5/3/2019 4219.23 0 0 0 1
我希望有这样的输出,并为short_pnl提供另一列:
date price long_entry long_exit short_entry short_exit long_pnl
0 24/2/2019 4124.25 0 0 0 0 NaN
1 25/2/2019 4130.67 0 0 0 0 NaN
2 26/2/2019 4145.67 1 0 0 0 64.45
3 27/2/2019 4180.10 0 0 0 0 NaN
4 28/2/2019 4200.05 0 0 0 0 NaN
5 1/3/2019 4210.12 0 1 0 0 NaN
6 2/3/2019 4198.10 0 0 0 0 NaN
7 3/3/2019 4210.34 0 0 0 0 NaN
8 4/3/2019 4100.12 0 0 1 0 NaN
9 5/3/2019 4219.23 0 0 0 1 NaN
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
WITH temp1 AS (
SELECT PARSE_DATE('%d/%m/%Y', dt) dt, CAST(price AS numeric) price, long_entry, long_exit, short_entry, short_exit
FROM `project.dataset.table`
), temp2 AS (
SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
SUM(long_entry) OVER(ORDER BY dt) + SUM(long_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) long_grp,
SUM(short_entry) OVER(ORDER BY dt) + SUM(short_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) short_grp
FROM temp1
)
SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
IF(long_entry = 0, NULL,
FIRST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC) -
LAST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC)
) long_pnl,
IF(short_entry = 0, NULL,
LAST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC) -
FIRST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC)
) short_pnl
FROM temp2
如果将以上内容应用于问题中的样本数据
#standardSQL
WITH `project.dataset.table` AS (
SELECT '24/2/2019' dt, 4124.25 price, 0 long_entry, 0 long_exit, 0 short_entry, 0 short_exit UNION ALL
SELECT '25/2/2019', 4130.67, 0, 0, 0, 0 UNION ALL
SELECT '26/2/2019', 4145.67, 1, 0, 0, 0 UNION ALL
SELECT '27/2/2019', 4180.10, 0, 0, 0, 0 UNION ALL
SELECT '28/2/2019', 4200.05, 0, 0, 0, 0 UNION ALL
SELECT '1/3/2019', 4210.12, 0, 1, 0, 0 UNION ALL
SELECT '2/3/2019', 4198.10, 0, 0, 0, 0 UNION ALL
SELECT '3/3/2019', 4210.34, 0, 0, 0, 0 UNION ALL
SELECT '4/3/2019', 4100.12, 0, 0, 1, 0 UNION ALL
SELECT '5/3/2019', 4219.23, 0, 0, 0, 1
), temp1 AS (
SELECT PARSE_DATE('%d/%m/%Y', dt) dt, CAST(price AS numeric) price, long_entry, long_exit, short_entry, short_exit
FROM `project.dataset.table`
), temp2 AS (
SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
SUM(long_entry) OVER(ORDER BY dt) + SUM(long_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) long_grp,
SUM(short_entry) OVER(ORDER BY dt) + SUM(short_exit) OVER(ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) short_grp
FROM temp1
)
SELECT dt, price, long_entry, long_exit, short_entry, short_exit,
IF(long_entry = 0, NULL,
FIRST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC) -
LAST_VALUE(price) OVER(PARTITION BY long_grp ORDER BY dt DESC)
) long_pnl,
IF(short_entry = 0, NULL,
LAST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC) -
FIRST_VALUE(price) OVER(PARTITION BY short_grp ORDER BY dt DESC)
) short_pnl
FROM temp2
-- ORDER BY dt
结果将是
Row dt price long_entry long_exit short_entry short_exit long_pnl short_pnl
1 2019-02-24 4124.25 0 0 0 0 null null
2 2019-02-25 4130.67 0 0 0 0 null null
3 2019-02-26 4145.67 1 0 0 0 64.45 null
4 2019-02-27 4180.1 0 0 0 0 null null
5 2019-02-28 4200.05 0 0 0 0 null null
6 2019-03-01 4210.12 0 1 0 0 null null
7 2019-03-02 4198.1 0 0 0 0 null null
8 2019-03-03 4210.34 0 0 0 0 null null
9 2019-03-04 4100.12 0 0 1 0 null -119.11
10 2019-03-05 4219.23 0 0 0 1 null null
我觉得应该有一个“更短的”解决方案-但以上内容仍然足够我认为可以使用