如何生成日期系列以占用Google BiqQuery中缺少的日期?

时间:2016-08-01 08:21:00

标签: sql google-bigquery

我想从谷歌大查询表中获取每日销售总额。我使用了以下代码。

select Day(InvoiceDate) date, Sum(InvoiceAmount) sales from test_gmail_com.sales 
where year(InvoiceDate) = Year(current_date()) and
Month(InvoiceDate) = Month(current_date())
group by date order by date

从上面的查询中,它只给出了表格中每日销售额的总和。有些日子有可能没有任何销售。对于那种情况,我需要得到日期和总和应该为0.例如,在每个月应该30 0r 31行与销售额之和。示例如下所示。本月的第4天没有销售。所以它的总和应该是0.

date | sales
-----+------
1    |   259
-----+------
2    |   359
-----+------
3    |   45
-----+------
4    |    0
-----+------
5    |  156

可以在Big-query中进行吗?基本上日期列应该是1 - 28/29/30或31st的系列,具体取决于一年中的月份

5 个答案:

答案 0 :(得分:7)

您可以使用以下内容生成给定范围内的所有日期(在下面的示例中,它是从2015-06-01到CURRENT_DATE()的所有日期 - 通过更改那些您可以控制生成的日期范围)

SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
     FROM (SELECT NULL)),h
)))

所以,现在 - 您可以将LEFT JOIN与您的表一起使用,以便记录所有日期。见下面的潜在例子

SELECT
  calendar_day,
  IFNULL(sales, 0) AS sales
FROM (
  SELECT DATE(DATE_ADD(TIMESTAMP("2015-06-01"), pos - 1, "DAY")) AS calendar_day
  FROM (
       SELECT ROW_NUMBER() OVER() AS pos, *
       FROM (FLATTEN((
       SELECT SPLIT(RPAD('', 1 + DATEDIFF(TIMESTAMP(CURRENT_DATE()), TIMESTAMP("2015-06-01")), '.'),'') AS h
       FROM (SELECT NULL)),h
  )))
) AS all_dates
LEFT JOIN (
  SELECT DAY(InvoiceDate) DATE, SUM(InvoiceAmount) sales 
  FROM test_gmail_com.sales 
  WHERE YEAR(InvoiceDate) = YEAR(CURRENT_DATE()) AND
  MONTH(InvoiceDate) = MONTH(CURRENT_DATE())
  GROUP BY DATE 
)
ON DATE = calendar_day  
  

我想要获得前几个月的销售额

下面给出了上个月的所有日子

SELECT DATE(DATE_ADD(DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY"), pos - 1, "DAY")) AS calendar_day
FROM (
     SELECT ROW_NUMBER() OVER() AS pos, *
     FROM (FLATTEN((
     SELECT SPLIT(RPAD('', 1 + DATEDIFF(DATE_ADD(CURRENT_DATE(), - DAY(CURRENT_DATE()), "DAY"), DATE_ADD(DATE_ADD(CURRENT_DATE(), -1, "MONTH"), 1 - DAY(CURRENT_DATE()), "DAY")), '.'),'') AS h
     FROM (SELECT NULL)),h
)))

答案 1 :(得分:4)

生成日期列表,然后将所需的任何表放在最顶部似乎是最简单的。我使用了generate_date_array + unnest,看起来很干净。

要生成日期列表(每行一天):

  SELECT
  *
  FROM 
    UNNEST(GENERATE_DATE_ARRAY('2018-10-01', '2020-09-30', INTERVAL 1 DAY)) AS example

答案 2 :(得分:2)

使用标准SQL方言和generate_array函数来简化代码:

WITH serialnum AS (
  SELECT
    sn
  FROM
    UNNEST(GENERATE_ARRAY(0, 
                          DATE_DIFF(DATE_ADD(DATE_TRUNC(CURRENT_DATE()
                                                      , MONTH)
                                          , INTERVAL 1 MONTH)
                                  , DATE_TRUNC(CURRENT_DATE(), MONTH)
                                  , DAY) - 1)
                          ) AS sn
), date_seq AS (
SELECT
    DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH),
            INTERVAL(sn) DAY) AS this_day
FROM
  serialnum
)
SELECT
    Day(InvoiceDate) date
    , Sum(IFNULL(InvoiceAmount, 0)) sales
FROM
    date_seq
    LEFT JOIN
    test_gmail_com.sales
ON
    date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
WHERE
    year(InvoiceDate) = Year(current_date())
    and
    Month(InvoiceDate) = Month(current_date())
GROUP BY
    date
ORDER BY
    date
;

<强>更新

或者,更简单地仍然使用generate_date_array函数:

WITH date_seq AS (
SELECT
  GENERATE_DATE_ARRAY(DATE_TRUNC(CURRENT_DATE(), MONTH), 
                      DATE_ADD(DATE_ADD(DATE_TRUNC(CURRENT_DATE(), MONTH)
                                        , INTERVAL 1 MONTH)
                               , INTERVAL -1 DAY)
                      , INTERVAL 1 DAY)
    AS this_day
)
SELECT
    Day(InvoiceDate) date
    , Sum(IFNULL(InvoiceAmount, 0)) sales
FROM
    date_seq
    LEFT JOIN
    test_gmail_com.sales
ON
    date_seq.this_day = DAY(test_gmail_com.sales.InvoiceDate)
WHERE
    year(InvoiceDate) = Year(current_date())
    and
    Month(InvoiceDate) = Month(current_date())
GROUP BY
    date
ORDER BY
    date
;

答案 3 :(得分:1)

出于这些目的,拥有一个“日历”表是一种实用方法,该表只列出某个范围内的所有日期。对于您的具体问题,只需要一个数字为1到31的表就足够了。获取此表的快捷方法是制作包含这些数字的电子表格,将其另存为csv文件并将此文件作为表格导入BigQuery

然后使用left outer join ifnull(sales,0) as sales将结果集package.json放到此表中。

如果您希望每月的天数(28--31)正确,您基本上有两种选择。您可以创建一个涵盖几年的正确日历表,并使用年,月和日加入。或者您使用数字1--31的简单表格,并根据月份和年份删除数字。

答案 4 :(得分:0)

对于标准SQL

WITH

splitted AS (
  SELECT
    *
  FROM
    UNNEST( SPLIT(RPAD('',
          1 + DATE_DIFF(CURRENT_DATE(), DATE("2015-06-01"), DAY),
          '.'),''))),
  with_row_numbers AS (
  SELECT
    ROW_NUMBER() OVER() AS pos,
    *
  FROM
    splitted),
  calendar_day AS (
  SELECT
    DATE_ADD(DATE("2015-06-01"), INTERVAL (pos - 1) DAY) AS day
  FROM
    with_row_numbers)
SELECT
  *
FROM
  calendar_day
ORDER BY
  day DESC