优化大型查询以使用较少的JOIN,计算不同的参与

时间:2015-10-02 13:40:26

标签: sql google-bigquery

在Google BigQuery上,我们有一个包含大约10列的报告,如:

+----------------+-----------------+---------------+-------------+
|     uniquesent | uniquedelivered | uniquebounced | uniqueopens |
+----------------+-----------------+---------------+-------------+

我们有一个更长的查询,它使用大量的连接来计算这些值,大致上这个大型查询是这样组织的:

select
    ...report_columns...,
   sent.uniquesent,
   delivered.uniquedelivered,
from [main table]
left join (
  select 
    language,
    exact_count_distinct(e.user_id) as uniquesent
   from emailevent e
    where country=1 and event='sent'
   group by 1
) as sent
left join (
  select 
    language,
    exact_count_distinct(e.user_id) as uniquedelivered
   from emailevent e
    where country=1 and event='delivered'
   group by 1
) as delivered

JOINs列表与其他10个类似项目的样式相同。还可以想象这个查询按日/周/月组分组,即使阅读也会变得非常复杂。我们还会收到其中一些错误消息:资源已超出。

我们希望重写和优化查询以返回相同的数字但效率更高。如果您有其他问题,请告诉我,但主要是我们希望以某种方式消除联接并使其紧凑并且表现更好。

我们已经使用以下语法对查询应用了一些压缩:

sum(if(p.country_id=1 AND event = "userblocked" AND JSON_EXTRACT_SCALAR(e.meta,'$.reason') contains 'drop_status',1,0)) as bounced,
sum(if(p.country_id=1 AND event = "userblocked" AND JSON_EXTRACT_SCALAR(e.meta,'$.reason') contains 'spam_report',1,0)) as spam_reported

但语法不适用于不同的计数。

2 个答案:

答案 0 :(得分:2)

您是否可以提升您想要查找的条件并将其转换为子选择中的字段,然后计算字段的不同值?换句话说,比如:

select
    ...report_columns...,
   t1.uniquesent,
   t1.uniquedelivered,
from [main table]
left join (
  select 
    language,
    exact_count_distinct(sent) as uniquesent,
    exact_count_distinct(users_delivered) as uniquedelivered,         
  from (
    select 
      language,
      if (country=1 and event='sent', e.user_id, null) as sent,
      if (country=1 and event='delivered', e.user_id, null) as delivered,
    from emailevent e
  ) group by language
) as t1

如果您使用太多不同的值进行精确计数,则可能会使您进入resources_exceeded区域。请注意,如果您将count distinct与桶数一起使用,您将获得桶数的确切计数。大多数时候,如果它很小,人们会关心确切的数字,但如果它变大,它就是近似值。

答案 1 :(得分:0)

对于您发布的块,您可以执行类似的操作来减少连接数。

select
    ...report_columns...,
   SUM(IF(event='sent', unique_event, 0)) as uniqusent
   SUM(IF(event='delivered', unique_event, 0)) as uniquedelivered
from [main table]
left join (
  select
    event,
    language,
    exact_count_distinct(e.user_id) as uniqueevent
   from emailevent e
    where country=1 and event in ('sent', 'delivered')
   group by event, language
) as sent