Bigquery,如何在一致的维度上进行多事实,多粒度查询

时间:2019-11-02 03:42:17

标签: google-bigquery google-data-studio

我有三个事实表

预算:类别,商品,预算时间

实际:类别,商品,日期,实际时间

基准:类别,商品,日期,预测时间

我想编写一个查询以返回预算小时数,实际小时数,按类别分组的预测小时数和按日期过滤的商品的总和。

请注意,三个事实的详细程度不同,为简单起见,我删除了另一个不常见的维度 当前,我正在使用BigQuery在Datastudio中使用此查询

with t0 as ( select category, commodity FROM `testing-bi-engine.starschema.budget`
             union distinct
             select category, commodity FROM `testing-bi-engine.starschema.actual`
             union distinct
             select category, commodity FROM `testing-bi-engine.starschema.baseline`)
SELECT t0.category, t0.commodity , sum(t2.actualhours) as actualhours , sum(t3.budgethours) as budgethours , sum(t4.forecast) as forecasthours FROM t0
left outer join
(SELECT category, commodity , sum(actualhours) as actualhours FROM `testing-bi-engine.starschema.actual`
WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
group by category, commodity) t2
on t0.category= t2.category and t0.commodity= t2.commodity
left outer join
(SELECT category, commodity , sum(budgethours) as budgethours FROM `testing-bi-engine.starschema.budget`
group by category, commodity) t3
on t0.category= t3.category and t0.commodity= t3.commodity
left outer join
(SELECT category, commodity , sum(forecast) as forecast FROM `testing-bi-engine.starschema.baseline`
  WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
group by category, commodity) t4
on t0.category= t4.category and t0.commodity= t4.commodity
group by t0.category, t0.commodity

它是具有多个事实表的典型星型模式 enter image description here

我的问题是否有更好的方式编写此查询?

1 个答案:

答案 0 :(得分:1)

  

是否有更好的方式编写此查询?

尝试以下方法:

  

重构-第一轮

GROUP BY删除了不必要的(最外面的)SUM,并替换了详细的ON以使紧凑的USING

#standardSQL
WITH t0 AS ( 
  SELECT category, commodity FROM `testing-bi-engine.starschema.budget` UNION DISTINCT
  SELECT category, commodity FROM `testing-bi-engine.starschema.actual` UNION DISTINCT
  SELECT category, commodity FROM `testing-bi-engine.starschema.baseline`
)
SELECT category, commodity, 
  actualhours , 
  budgethours , 
  forecast 
FROM t0 LEFT OUTER JOIN (
  SELECT category, commodity , SUM(actualhours) AS actualhours 
  FROM `testing-bi-engine.starschema.actual`
  WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
  GROUP BY category, commodity
) t2 USING(category, commodity)
LEFT OUTER JOIN (
  SELECT category, commodity , SUM(budgethours) AS budgethours 
  FROM `testing-bi-engine.starschema.budget`
  GROUP BY category, commodity
) t3 USING(category, commodity)
LEFT OUTER JOIN (
  SELECT category, commodity , SUM(forecast) AS forecast 
  FROM `testing-bi-engine.starschema.baseline`
  WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
  GROUP BY category, commodity
) t4 USING(category, commodity)
  

重构-第2轮

消除了t0,因为它并不是真正需要的,因此将LEFT OUTER替换为FULL OUTER

#standardSQL
SELECT category, commodity, 
  actualhours , 
  budgethours , 
  forecast 
FROM (
  SELECT category, commodity , SUM(actualhours) AS actualhours 
  FROM `testing-bi-engine.starschema.actual`
  WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
  GROUP BY category, commodity
) t2 
FULL OUTER JOIN (
  SELECT category, commodity , SUM(budgethours) AS budgethours 
  FROM `testing-bi-engine.starschema.budget`
  GROUP BY category, commodity
) t3 USING(category, commodity)
FULL OUTER JOIN (
  SELECT category, commodity , SUM(forecast) AS forecast 
  FROM `testing-bi-engine.starschema.baseline`
  WHERE date <= PARSE_DATE('%Y%m%d', @DS_END_DATE)
  GROUP BY category, commodity
) t4 USING(category, commodity)