BigQuery分区表:指定一个月的最短方法是什么?

时间:2018-05-22 16:58:38

标签: sql google-bigquery user-defined-functions standard-sql

想知道在查询分区表时指定月份的最短路径是什么。

本月的

TIMESTAMP_TRUNC看起来很诱人,但似乎不能作为分区过滤器使用:

SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE TIMESTAMP_TRUNC(datehour, month) = '2018-04-01'

Cannot query over table 'fh-bigquery.wikipedia_v2.pageviews_2018' without a filter that can be used for partition elimination

BETWEEN日期看起来也很诱人,但需要知道每个月有多少天:

SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE(datehour) BETWEEN '2018-04-01' AND '2018-04-31'

Could not cast literal "2018-04-31" to type DATE at [3:47]

DATE_SUB(DATE_ADD(month), day有效,但需要输入日期两次,输入时间很长:

SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE(datehour) 
  BETWEEN '2018-04-01' 
  AND DATE_SUB(DATE_ADD('2018-04-01', INTERVAL 1 MONTH), INTERVAL 1 DAY) 

15746003449

你会如何改善这一点?

2 个答案:

答案 0 :(得分:2)

我会这样做:

SELECT SUM(views) as views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE hour >= date '2018-04-01' AND hour < date '2018-05-01';

您可以将日期常量放在CTE中:

with params as (
      select date '2018-04-01' as dte
     )
select sum(views) as views
from params cross join
     `fh-bigquery.wikipedia_v2.pageviews_2018`
where hour >= params.dte AND hour < date_add(params.dte, interval 1 month)

答案 1 :(得分:1)

更新:在进一步试验时,这是我最好的解决方案:

SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE_TRUNC(DATE(datehour), month) = '2018-04-01'

这基本上是问题中的第一次尝试,再加上时间戳到DATE,然后应用DATE_TRUNC

留下我在下面尝试的其他选项,因为它们可能对其他情况有用。

一种选择是使用WITH来定义变量,因此只键入一次月份:

WITH month AS (SELECT DATE('2018-04-01') m), 
  full_month AS (SELECT m AS s, DATE_SUB(DATE_ADD(m, INTERVAL 1 MONTH), INTERVAL 1 DAY) AS e FROM month)

SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE(datehour) 
  BETWEEN (SELECT s FROM full_month) AND (SELECT e FROM full_month)

同样,您可以定义SQL UDF函数:

CREATE TEMPORARY FUNCTION month() AS (DATE('2018-04-01'));
CREATE TEMPORARY FUNCTION month_end() AS (DATE_SUB(DATE_ADD(month(), INTERVAL 1 MONTH), INTERVAL 1 DAY));

SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE(datehour) BETWEEN month() AND month_end() 

对于这两个选项,BigQuery可以识别并优化仅扫描所需的分区。