我有三个事实表
预算:类别,商品,预算时间
实际:类别,商品,日期,实际时间
基准:类别,商品,日期,预测时间
我想编写一个查询以返回预算小时数,实际小时数,按类别分组的预测小时数和按日期过滤的商品的总和。
请注意,三个事实的详细程度不同,为简单起见,我删除了另一个不常见的维度 当前,我正在使用BigQuery在Datastudio中使用此查询
with t0 as ( select category, commodity FROM `testing-bi-engine.starschema.budget`
union distinct
select category, commodity FROM `testing-bi-engine.starschema.actual`
union distinct
select category, commodity FROM `testing-bi-engine.starschema.baseline`)
SELECT t0.category, t0.commodity , sum(t2.actualhours) as actualhours , sum(t3.budgethours) as budgethours , sum(t4.forecast) as forecasthours FROM t0
left outer join
(SELECT category, commodity , sum(actualhours) as actualhours FROM `testing-bi-engine.starschema.actual`
WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
group by category, commodity) t2
on t0.category= t2.category and t0.commodity= t2.commodity
left outer join
(SELECT category, commodity , sum(budgethours) as budgethours FROM `testing-bi-engine.starschema.budget`
group by category, commodity) t3
on t0.category= t3.category and t0.commodity= t3.commodity
left outer join
(SELECT category, commodity , sum(forecast) as forecast FROM `testing-bi-engine.starschema.baseline`
WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
group by category, commodity) t4
on t0.category= t4.category and t0.commodity= t4.commodity
group by t0.category, t0.commodity
我的问题是否有更好的方式编写此查询?
答案 0 :(得分:1)
是否有更好的方式编写此查询?
尝试以下方法:
重构-第一轮
用GROUP BY
删除了不必要的(最外面的)SUM
,并替换了详细的ON
以使紧凑的USING
#standardSQL
WITH t0 AS (
SELECT category, commodity FROM `testing-bi-engine.starschema.budget` UNION DISTINCT
SELECT category, commodity FROM `testing-bi-engine.starschema.actual` UNION DISTINCT
SELECT category, commodity FROM `testing-bi-engine.starschema.baseline`
)
SELECT category, commodity,
actualhours ,
budgethours ,
forecast
FROM t0 LEFT OUTER JOIN (
SELECT category, commodity , SUM(actualhours) AS actualhours
FROM `testing-bi-engine.starschema.actual`
WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
GROUP BY category, commodity
) t2 USING(category, commodity)
LEFT OUTER JOIN (
SELECT category, commodity , SUM(budgethours) AS budgethours
FROM `testing-bi-engine.starschema.budget`
GROUP BY category, commodity
) t3 USING(category, commodity)
LEFT OUTER JOIN (
SELECT category, commodity , SUM(forecast) AS forecast
FROM `testing-bi-engine.starschema.baseline`
WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
GROUP BY category, commodity
) t4 USING(category, commodity)
重构-第2轮
消除了t0,因为它并不是真正需要的,因此将LEFT OUTER
替换为FULL OUTER
#standardSQL
SELECT category, commodity,
actualhours ,
budgethours ,
forecast
FROM (
SELECT category, commodity , SUM(actualhours) AS actualhours
FROM `testing-bi-engine.starschema.actual`
WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
GROUP BY category, commodity
) t2
FULL OUTER JOIN (
SELECT category, commodity , SUM(budgethours) AS budgethours
FROM `testing-bi-engine.starschema.budget`
GROUP BY category, commodity
) t3 USING(category, commodity)
FULL OUTER JOIN (
SELECT category, commodity , SUM(forecast) AS forecast
FROM `testing-bi-engine.starschema.baseline`
WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
GROUP BY category, commodity
) t4 USING(category, commodity)