BigQuery-折叠具有连续日期的行

时间:2018-07-25 13:58:27

标签: sql google-bigquery olap

我有一张包含销售目标的表。通常每月设置一次,但每天和市场都将它们加载到表中。例如,如果1月份的英国目标是1550,它将被加载为31行(1月每天1行),每行的目标是50(1550/31天)。

WITH targets AS (
  SELECT DATE '2018-01-01' AS date, 'uk' AS market, NUMERIC '50' AS target
  UNION ALL SELECT '2018-01-02', "uk", 50
  UNION ALL SELECT '2018-01-03', "uk", 50
  # ...
  UNION ALL SELECT '2018-01-31', "uk", 50
  UNION ALL SELECT '2018-02-01', "uk", 25
  UNION ALL SELECT '2018-02-02', "uk", 25
  # ...
  UNION ALL SELECT '2018-02-27', "uk", 25
  UNION ALL SELECT '2018-02-28', "uk", 25
  UNION ALL SELECT '2018-03-01', "uk", 50
  UNION ALL SELECT '2018-03-02', "uk", 50
  UNION ALL SELECT '2018-03-03', "uk", 50
  # ...
  UNION ALL SELECT '2018-03-31', "uk", 50
)

我希望将其折叠起来,以便每行都有一个dateFromdateTo列,以减少加载数据的工作量以及查询数据的时间/成本。

我通过对市场和目标进行分组并汇总最大和最小日期以及目标的总和来做到这一点:

SELECT
  MIN(date) AS dateFrom,
  MAX(date) AS dateTo,
  Market,
  target AS dailyTarget,
  SUM(target) AS target
FROM targets
GROUP BY Market, dailyTarget
ORDER BY dateFrom

我希望有三行,但是有一个问题-相同市场和目标的月份被另一个目标的月份分开,我们得到重叠的行。在上面的示例中,一月和三月的每日目标均为50,而二月的目标为25。

Incorrect results

我认为解决方案在于使用窗口仅将日期与上一行日期相邻的行组合在一起-但我不知道如何实现!

感谢您的所有帮助!

2 个答案:

答案 0 :(得分:1)

这是一个孤岛问题。您可以使用以下方式获取范围:

select market, min(date), max(date), target
from (select t.*,
             row_number() over (partition by market, target order by date) as seqnum_t,
             row_number() OVER (partition by market order by date) as seqnum
      from targets t
     ) t
group by market, target, (seqnum - seqnum_t)

答案 1 :(得分:0)

以下是用于BigQuery标准SQL

#standardSQL
SELECT
  MIN(date) AS dateFrom,
  MAX(date) AS dateTo,
  Market,
  target AS dailyTarget,
  SUM(target) AS target
FROM `project.dataset.targets`
GROUP BY Market, dailyTarget, DATE_TRUNC(date, MONTH)
ORDER BY dateFrom

如您所见-您只需将DATE_TRUNC(date, MONTH)添加到GROUP BY