我正在尝试对当前行周围特定日期范围的值求和。由于Bigquery不支持Windows函数中的日期范围,因此我使用自连接,如下所示:
with test_data as (
select 1 val1, 7 val2, 'ord001' id, timestamp('2019-01-01 04:00:00') dt_order
union all
select 2 val1, 14 val2, 'ord002' id, timestamp('2019-01-02 05:00:00') dt_order
union all
select 3 val1, 21 val2, 'ord003' id, timestamp('2019-01-03 06:00:00') dt_order
)
,revenue_coeff as (
select
td.id,
td.val1 *
(select sum(td1.val2) / sum(td1.val1)
from test_data td1
where td1.dt_order >= timestamp_sub(td.dt_order, interval 24 hour) and
td1.dt_order < timestamp_add(td.dt_order, interval 6 minute)
)
from test_data td
)
select * from revenue_coeff
此玩具查询正常运行。但是,当我尝试使用实际的BigQuery表时,出现“没有条件,即联接两端的字段均等的情况,则无法使用LEFT OUTER JOIN”。 如何在BQ中实现这样的查询?预先感谢!
答案 0 :(得分:2)
以下是用于BigQuery标准SQL
我将首先在您的帖子结尾回答您的问题-但是比之在您的帖子顶部回答您的声明。所以...
我得到一条“ LEFT OUTER JOIN,如果没有这样的条件,即连接两面的字段均等,则无法使用”。如何在BQ中实现这样的查询?
#standardSQL
WITH `project.dataset.test_data` AS (
SELECT 1 val1, 7 val2, 'ord001' id, TIMESTAMP('2019-01-01 04:00:00') dt_order UNION ALL
SELECT 1 val1, 14 val2, 'ord002' id, TIMESTAMP('2019-01-02 05:00:00') dt_order UNION ALL
SELECT 1 val1, 21 val2, 'ord003' id, TIMESTAMP('2019-01-03 06:00:00') dt_order
), revenue_coeff AS (
SELECT
td1.id,
td1.val1 * SUM(td2.val2) / SUM(td2.val1)
FROM `project.dataset.test_data` td1
CROSS JOIN `project.dataset.test_data` td2
WHERE td2.dt_order >= TIMESTAMP_SUB(td1.dt_order, INTERVAL 24 HOUR)
AND td2.dt_order < TIMESTAMP_ADD(td1.dt_order, INTERVAL 6 MINUTE)
GROUP BY td1.id, td1.val1
)
SELECT * FROM revenue_coeff
如您所见-代替LEFT JOIN,您可以将CROSS JOIN与ON子句一起移入WHERE子句
由于Bigquery不支持Windows函数中的日期范围...
实际上,它确实支持-参见示例
#standardSQL
WITH `project.dataset.test_data` AS (
SELECT 1 val1, 7 val2, 'ord001' id, TIMESTAMP('2019-01-01 04:00:00') dt_order UNION ALL
SELECT 1 val1, 14 val2, 'ord002' id, TIMESTAMP('2019-01-02 05:00:00') dt_order UNION ALL
SELECT 1 val1, 21 val2, 'ord003' id, TIMESTAMP('2019-01-03 06:00:00') dt_order
), revenue_coeff AS (
SELECT id, val1 * SUM(val2) OVER(win) / SUM(val1) OVER(win)
FROM `project.dataset.test_data` td1
WINDOW win AS (ORDER BY UNIX_SECONDS(dt_order) RANGE BETWEEN 86400 PRECEDING AND 359 FOLLOWING )
)
SELECT * FROM revenue_coeff
如您所见-诀窍在于使用UNIX_SECONDS函数将时间戳数据类型“转换”为int
很明显-我建议您使用第二个选项
答案 1 :(得分:0)
您也可以执行左外部联接,例如:
select a.val1, a.id,
sum(if(b.dt_order >= timestamp_sub(a.dt_order, interval 24 hour) and b.dt_order <= timestamp_add(a.dt_order, interval 6 minute), b.val2, 0.0))
/
sum(if(b.dt_order >= timestamp_sub(a.dt_order, interval 24 hour) and b.dt_order <= timestamp_add(a.dt_order, interval 6 minute), b.val2, 0.0))
from test_data a
left join test_data b on 1=1
group by 1,2
但是,您必须在上游或通过在其中添加case语句来管理零除错误。