想知道在查询分区表时指定月份的最短路径是什么。
本月的 TIMESTAMP_TRUNC
看起来很诱人,但似乎不能作为分区过滤器使用:
SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE TIMESTAMP_TRUNC(datehour, month) = '2018-04-01'
Cannot query over table 'fh-bigquery.wikipedia_v2.pageviews_2018' without a filter that can be used for partition elimination
BETWEEN
日期看起来也很诱人,但需要知道每个月有多少天:
SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE(datehour) BETWEEN '2018-04-01' AND '2018-04-31'
Could not cast literal "2018-04-31" to type DATE at [3:47]
DATE_SUB(DATE_ADD(month), day
有效,但需要输入日期两次,输入时间很长:
SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE(datehour)
BETWEEN '2018-04-01'
AND DATE_SUB(DATE_ADD('2018-04-01', INTERVAL 1 MONTH), INTERVAL 1 DAY)
15746003449
你会如何改善这一点?
答案 0 :(得分:2)
我会这样做:
SELECT SUM(views) as views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE hour >= date '2018-04-01' AND hour < date '2018-05-01';
您可以将日期常量放在CTE中:
with params as (
select date '2018-04-01' as dte
)
select sum(views) as views
from params cross join
`fh-bigquery.wikipedia_v2.pageviews_2018`
where hour >= params.dte AND hour < date_add(params.dte, interval 1 month)
答案 1 :(得分:1)
更新:在进一步试验时,这是我最好的解决方案:
SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE_TRUNC(DATE(datehour), month) = '2018-04-01'
这基本上是问题中的第一次尝试,再加上时间戳到DATE,然后应用DATE_TRUNC
。
留下我在下面尝试的其他选项,因为它们可能对其他情况有用。
一种选择是使用WITH来定义变量,因此只键入一次月份:
WITH month AS (SELECT DATE('2018-04-01') m),
full_month AS (SELECT m AS s, DATE_SUB(DATE_ADD(m, INTERVAL 1 MONTH), INTERVAL 1 DAY) AS e FROM month)
SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE(datehour)
BETWEEN (SELECT s FROM full_month) AND (SELECT e FROM full_month)
同样,您可以定义SQL UDF函数:
CREATE TEMPORARY FUNCTION month() AS (DATE('2018-04-01'));
CREATE TEMPORARY FUNCTION month_end() AS (DATE_SUB(DATE_ADD(month(), INTERVAL 1 MONTH), INTERVAL 1 DAY));
SELECT SUM(views) views
FROM `fh-bigquery.wikipedia_v2.pageviews_2018`
WHERE DATE(datehour) BETWEEN month() AND month_end()
对于这两个选项,BigQuery可以识别并优化仅扫描所需的分区。