我有2个分区表:
表1:
| user_id | request_id |
表2:
| ip | user_id | request_id |
我想从partition_table2获取所有IP: -用户数(来自partition_table1) -用户请求(来自partition_table1) -用户请求(来自partition_table2)对用户(来自partition_table1)
信息: IP与表1中的request_id相关,因为一个user_id可以有多个IP。
问题: 当我在主查询中按_PARTITIONTIME进行过滤时,执行LEFT JOIN时不会传播到WITH进行查询,但是当我进行INNER JOIN时,将通过_PARTITIONTIME进行过滤。
分区修剪似乎无效:https://cloud.google.com/bigquery/docs/querying-partitioned-tables用于LEFT JOIN
我的查询
WITH
users_info AS (
SELECT
t2.ip,
t1.user_id,
COUNT(DISTINCT t1.request_id) AS user_requests,
t1._PARTITIONTIME AS date
FROM partitioned_table1 t1
INNER JOIN partition_table2 t2
ON t1.request_id = t2.request_id
AND t1._PARTITIONTIME = t2._PARTITIONTIME
GROUP BY t2.ip, t1.user_id, t1._PARTITIONTIME
)
SELECT
t2.ip,
COUNT(DISTINCT m.user_id) AS users,
COUNT(DISTINCT t2.request_id) AS t2_users_requests,
SUM(m.user_requests) AS t1_users_requests
FROM partition_table2 t2
LEFT JOIN/INNER JOIN users_info m
ON t2.ip=m.ip
AND t2.user_id=m.user_id
AND m.date = t2._PARTITIONTIME
WHERE DATE(t2._PARTITIONTIME) = "2019-05-20"
GROUP BY t2.ip
如果我执行INNER JOIN,此查询将处理〜4 GB,但是使用LEFT JOIN它将处理〜3 TB
我做错了事,还是这种行为是预期的?
我需要此查询来创建一个VIEW。来自上述查询的Condition(DATE(t2._PARTITIONTIME)=“ 2019-05-20”)我将在查询时使用它来过滤VIEW。
答案 0 :(得分:0)
LEFT OUTER JOIN右侧的列可能为NULL,因此,是的,BigQuery实际上需要执行连接以找出结果,而不是预先过滤分区。如果您不希望出现这种情况,请使用子查询在联接之前在_PARTITIONTIME
上进行过滤。