使用分析函数时,在查询执行期间超出了Bigquery资源

时间:2017-05-08 10:27:31

标签: google-bigquery standard-sql

在我的BQ standardsql查询中,当我使用很少的分析函数(https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#analytic-functions)时,我收到此错误:

Resources exceeded during query execution. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors

查询的几个字段计算方式与此类似:

case when 1 = ROW_NUMBER() over (partition by Y,m,operatingSystem)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,m,operatingSystem)
else null end as NewUniqueVisitorsMonthlyOS

当我拆分查询并逐个运行每个部分时,它们都运行良好。 但是,我不想将查询拆分成多个,因为我需要在一个最终的BQ视图中包含所有字段。

有什么办法可以解决这个错误吗?

UPD: 这是一个查询示例。当我添加更多字段时,它将停止处理上述错误。

SELECT 
distinct
Date
,channelGrouping
,country
,browser
,deviceCategory
,operatingSystem

#Visits by all dimensions
,count(distinct concat(fullvisitorid,cast(visitid as string))) 
over (partition by concat(Y,m,d),channelGrouping,country,browser,deviceCategory,operatingSystem)
as Visits 

#Daily Users Browser
,case when 1 = ROW_NUMBER() over (partition by Y,m,d,browser)
then count (distinct fullvisitorid) 
over (partition by Y,m,d,browser)
else null end as UniqueVisitorsDailyBrowser


#Weekly New Users
,case when 1 = ROW_NUMBER() over (partition by Y,U,channelGrouping)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,U,channelGrouping)
else null end as NewUniqueVisitorsWeeklyChannel

#Monthly New Users
,case when 1 = ROW_NUMBER() over (partition by Y,m,operatingSystem)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,m,operatingSystem)
else null end as NewUniqueVisitorsMonthlyOS

FROM GA_Export_Schema

1 个答案:

答案 0 :(得分:0)

当我需要使用单个查询回答几个不同的问题时,我通常会做的是尝试使用一些UNION ALL操作和一些来区分数据。

我根据您的查询在我们的数据集中测试了此查询:

SELECT
  date date,
  country country,
  channel channel,
  browser browser,
  cat cat,
  os os,
  MAX(all_keys_visits) all_keys_visits,
  MAX(browser_visits) browser_visits,
  MAX(week_new_channel_users) week_new_channel_users,
  MAX(month_new_os_users) month_new_os_users from(
  SELECT
    date,
    country,
    channel,
    browser,
    cat,
    os,
    visits AS all_keys_visits,
    MAX(CASE
        WHEN tmp = 'browser' THEN visits END) OVER(PARTITION BY browser) browser_visits,  MAX(CASE
        WHEN tmp = 'weekly_new_users' THEN visits END) OVER(PARTITION BY channel, week_date) week_new_channel_users,
    MAX(CASE
        WHEN tmp = 'monthly_new_users' THEN visits END) OVER(PARTITION BY os, month_date) month_new_os_users from(
          SELECT
            tmp,
            date,
            week_date,
            month_date,
            country,
            channel,
            browser,
            cat,
            os,
            visits  FROM (
              SELECT
                'all_visitors' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                geonetwork.country country,
                channelGrouping channel,
                device.browser browser,
                device.devicecategory cat,
                device.operatingSystem os,
                COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitid AS string))) visits
              FROM `project_id.dataset_id.ga_sessions*`
              WHERE 1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))  GROUP BY tmp,  date,  channel,  country,  browser,  cat,  os )
            UNION ALL (
              SELECT
               'browser' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                '' country,
                '' channel,
                device.browser browser,
                '' cat,
                '' os,
                COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitid AS string))) visits
                FROM `project_id.dataset_id.ga_sessions*`  WHERE 1 = 1 AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
                GROUP BY tmp,  date,  channel,  country,  browser,  cat,  os )
            UNION ALL (
              SELECT
               'weekly_new_users' AS tmp,
                date,
                FORMAT_DATE("%W", parse_DATE('%Y%m%d',  date)) week_date,
                FORMAT_DATE("%m", parse_DATE('%Y%m%d',  date)) month_date,
                '' country,
                channelGrouping channel,
                '' browser,
                '' cat,
                '' os,
                COUNT(DISTINCT CASE
                   WHEN totals.newVisits = 1 THEN CONCAT(fullvisitorid, CAST(visitid AS string)) END) visits
              FROM `project_id.dataset_id.ga_sessions*`
              WHERE
              1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
              AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
            GROUP BY
              tmp,
              date,
              channel,
              country,
              browser,
              cat,
              os )
          UNION ALL (
            SELECT
              'monthly_new_users' AS tmp,
              date,
              FORMAT_DATE("%W", parse_DATE('%Y%m%d',
                  date)) week_date,
              FORMAT_DATE("%m", parse_DATE('%Y%m%d',
                  date)) month_date,
              '' country,
              '' channel,
              '' browser,
              '' cat,
              device.operatingSystem os,
              COUNT(DISTINCT
                CASE
                  WHEN totals.newVisits = 1 THEN CONCAT(fullvisitorid, CAST(visitid AS string)) END) visits
            FROM
              `project_id.dataset_id.ga_sessions*`
            WHERE
              1 = 1
              AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
              AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
            GROUP BY
              tmp,
              date,
              channel,
              country,
              browser,
              cat,
              os ) ) )
GROUP BY
  date,
  country,
  channel,
  browser,
  cat,
  os
HAVING
  (country != ''
    AND channel != ''
    AND browser != ''
    AND cat != ''
    AND os != '')

基本上在每个UNION上我创建了一个密钥,后来我根据该密钥和你想要分析的值进行了聚合。之后,我刚刚删除了创建为空字符串的字段。

我尝试处理30天的数据,这些数据消耗了几个千兆,但仍然在不到20秒的时间内完成,所以它也可能适用于你(注意这里通过单独运行工会然后聚合以后你最终工作更小的数据量,避免资源耗尽。)