BigQuery-具有多个字段的分组依据非常慢

时间:2018-07-20 20:32:55

标签: sql google-bigquery

我正在尝试按多个字段进行分组,例如跨越几年且具有唯一天数(最多5 * 365天)的日期和一些唯一ID(我相信有几千个)。

查询非常简单:

SELECT
  cs.CriterionId,
  cs.AdGroupId,
  cs.CampaignId,
  cs.Date,
  SUM(cs.Impressions) AS Sum_Impressions,
  SUM(cs.Clicks) AS Sum_Clicks,
  SUM(cs.Interactions) AS Sum_Interactions,
  (SUM(cs.Cost) / 1000000) AS Sum_Cost,
  SUM(cs.Conversions) AS Sum_Conversions,
  cs.AdNetworkType1,
  cs.AdNetworkType2,
  cs.AveragePosition,
  cs.Device,
  cs.InteractionTypes
FROM
  `adwords.Keyword_{customer_id}` c
LEFT JOIN
  `adwords.KeywordBasicStats_{customer_id}` cs
ON
  c.ExternalCustomerId = cs.ExternalCustomerId
WHERE
  c._DATA_DATE = c._LATEST_DATE
  AND c.ExternalCustomerId = {customer_id}
GROUP BY
  1, 2, 3, 4, 10, 11, 12, 13, 14
ORDER BY
  1, 2, 3, 4, 10, 11, 12, 13, 14

keywordBasicStats表具有大约700MB的数据,而Keyword具有大约50MB的数据,并且现在运行了大约几个小时。

不确定是否有优化此SQL查询的方法。

如果对Google感兴趣的人,职位编号为:

blissful-land-197118:US.bquijob_668c014c_164b8710acc

2 个答案:

答案 0 :(得分:1)

我认为,导致此查询极其缓慢的原因是ORDER BY
只需将其删除,然后重试

答案 1 :(得分:1)

尝试此操作(由于您的列数据类型,可能需要修复):

SELECT
  cs.CriterionId,
  cs.AdGroupId,
  cs.CampaignId,
  cs.Date,
  SUM(cs.Impressions) AS Sum_Impressions,
  SUM(cs.Clicks) AS Sum_Clicks,
  SUM(cs.Interactions) AS Sum_Interactions,
  (SUM(cs.Cost) / 1000000) AS Sum_Cost,
  SUM(cs.Conversions) AS Sum_Conversions,
  cs.AdNetworkType1,
  cs.AdNetworkType2,
  cs.AveragePosition,
  cs.Device,
  cs.InteractionTypes
FROM
  `adwords.Keyword_{customer_id}` c
INNER JOIN
  `adwords.KeywordBasicStats_{customer_id}` cs
ON
  c.ExternalCustomerId = cs.ExternalCustomerId
WHERE
  c._DATA_DATE = c._LATEST_DATE
  AND c.ExternalCustomerId = {customer_id}
GROUP BY
  1, 2, 3, 4, 10, 11, 12, 13, 14

UNION ALL

SELECT
  cs.CriterionId,
  cs.AdGroupId,
  cs.CampaignId,
  cs.Date,
  0.0 AS Sum_Impressions,
  0.0 AS Sum_Clicks,
  0.0 AS Sum_Interactions,
  0.0 AS Sum_Cost,
  0.0 AS Sum_Conversions,
  cs.AdNetworkType1,
  cs.AdNetworkType2,
  cs.AveragePosition,
  cs.Device,
  cs.InteractionTypes
FROM
  `adwords.Keyword_{customer_id}` c
LEFT JOIN
  `adwords.KeywordBasicStats_{customer_id}` cs
ON
  c.ExternalCustomerId = cs.ExternalCustomerId
WHERE cs.ExternalCustomerId IS NULL 
  c._DATA_DATE = c._LATEST_DATE
  AND c.ExternalCustomerId = {customer_id}
GROUP BY
  1, 2, 3, 4, 10, 11, 12, 13, 14

ORDER BY
  1, 2, 3, 4, 10, 11, 12, 13, 14