这个简单的查询是超时的,任何想法如何使用一些BigQuery技巧来优化它?
SELECT
s.typeFlight s_type, r.distance, r.price, (d.booking_token IS NULL) clicked
FROM [search.searches] s
LEFT JOIN [search.search_results] r ON r.searchid=s.searchid
LEFT JOIN [search.clicks] d ON d.booking_token=r.booking_token
WHERE s.saved_at BETWEEN TIMESTAMP('2016-03-01 00:00:00')
AND TIMESTAMP('2016-03-05 00:00:00')
查询设置
数据来自搜索引擎,因此表点击很小(数百万行),但表搜索和 search_results 是巨大的。查询处理大约5 TB的数据。
答案 0 :(得分:0)
您可以将where过滤推送到第一个选择中,以便加入更少的数据:
SELECT
s.typeFlight s.type, r.distance, r.price, (d.booking_token IS NULL) clicked
FROM (
SELECT typeFlight, type, searchid
FROM [search.searches]
WHERE saved_at BETWEEN TIMESTAMP('2016-03-01 00:00:00')
AND TIMESTAMP('2016-03-05 00:00:00')
) s
LEFT JOIN [search.search_results] r ON r.searchid=s.searchid
LEFT JOIN [search.clicks] d ON d.booking_token=r.booking_token
有时,查看查询计划说明https://cloud.google.com/bigquery/query-plan-explanation以查看查询花费时间的位置会很有帮助。