查询执行期间超出了BigQuery资源-优化

时间:2019-12-19 14:40:09

标签: sql google-bigquery

我对此查询有疑问。

SELECT event_date, country, COUNT(*) AS sessions,
       AVG(length) AS average_session_length
  FROM (

SELECT country, event_date, global_session_id,
       (MAX(event_timestamp) - MIN(event_timestamp))/(60 * 1000 * 1000) AS length
  FROM (
SELECT user_pseudo_id,
       event_timestamp,
       country,
       event_date,
       SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,
       SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS user_session_id
  FROM (
       SELECT *,
              CASE WHEN event_timestamp - last_event >= (30*60*1000*1000) 
                     OR last_event IS NULL 
                   THEN 1 ELSE 0 END AS is_new_session
         FROM (
              SELECT user_pseudo_id,
                     event_timestamp,
                     geo.country,
                     event_date,
                     LAG(event_timestamp,1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS last_event
                FROM `xxx.events*`
              ) last
       ) final
       ) session
 GROUP BY global_session_id, country, event_date

       ) agg
WHERE length >= (10/60)
group by country, event_date

Google Cloud Console给出了错误

Resources exceeded during query execution: The query could not be executed in the allotted memory.

我知道OVER子句可能是一个问题,但是我不知道如何编辑查询来获得相同的结果。 感谢您的帮助。 谢谢你们!

1 个答案:

答案 0 :(得分:1)

如果我不得不猜测,就是这一行:

  SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,

我建议更改代码,以使“全局”会话ID真正对于每个用户而言都是本地的:

  SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS global_session_id,

如果您调整查询并使其基本起作用,则资源问题将得到解决。下一步是弄清楚如何获取所需的全局ID。最简单的解决方案是为每个用户使用本地ID。