在Google BigQuery上,我们有一个包含大约10列的报告,如:
+----------------+-----------------+---------------+-------------+
| uniquesent | uniquedelivered | uniquebounced | uniqueopens |
+----------------+-----------------+---------------+-------------+
我们有一个更长的查询,它使用大量的连接来计算这些值,大致上这个大型查询是这样组织的:
select
...report_columns...,
sent.uniquesent,
delivered.uniquedelivered,
from [main table]
left join (
select
language,
exact_count_distinct(e.user_id) as uniquesent
from emailevent e
where country=1 and event='sent'
group by 1
) as sent
left join (
select
language,
exact_count_distinct(e.user_id) as uniquedelivered
from emailevent e
where country=1 and event='delivered'
group by 1
) as delivered
此JOINs
列表与其他10个类似项目的样式相同。还可以想象这个查询按日/周/月组分组,即使阅读也会变得非常复杂。我们还会收到其中一些错误消息:资源已超出。
我们希望重写和优化查询以返回相同的数字但效率更高。如果您有其他问题,请告诉我,但主要是我们希望以某种方式消除联接并使其紧凑并且表现更好。
我们已经使用以下语法对查询应用了一些压缩:
sum(if(p.country_id=1 AND event = "userblocked" AND JSON_EXTRACT_SCALAR(e.meta,'$.reason') contains 'drop_status',1,0)) as bounced,
sum(if(p.country_id=1 AND event = "userblocked" AND JSON_EXTRACT_SCALAR(e.meta,'$.reason') contains 'spam_report',1,0)) as spam_reported
但语法不适用于不同的计数。
答案 0 :(得分:2)
您是否可以提升您想要查找的条件并将其转换为子选择中的字段,然后计算字段的不同值?换句话说,比如:
select
...report_columns...,
t1.uniquesent,
t1.uniquedelivered,
from [main table]
left join (
select
language,
exact_count_distinct(sent) as uniquesent,
exact_count_distinct(users_delivered) as uniquedelivered,
from (
select
language,
if (country=1 and event='sent', e.user_id, null) as sent,
if (country=1 and event='delivered', e.user_id, null) as delivered,
from emailevent e
) group by language
) as t1
如果您使用太多不同的值进行精确计数,则可能会使您进入resources_exceeded区域。请注意,如果您将count distinct与桶数一起使用,您将获得桶数的确切计数。大多数时候,如果它很小,人们会关心确切的数字,但如果它变大,它就是近似值。
答案 1 :(得分:0)
对于您发布的块,您可以执行类似的操作来减少连接数。
select
...report_columns...,
SUM(IF(event='sent', unique_event, 0)) as uniqusent
SUM(IF(event='delivered', unique_event, 0)) as uniquedelivered
from [main table]
left join (
select
event,
language,
exact_count_distinct(e.user_id) as uniqueevent
from emailevent e
where country=1 and event in ('sent', 'delivered')
group by event, language
) as sent