在我的BQ standardsql查询中,当我使用很少的分析函数(https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#analytic-functions)时,我收到此错误:
Resources exceeded during query execution. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors
查询的几个字段计算方式与此类似:
case when 1 = ROW_NUMBER() over (partition by Y,m,operatingSystem)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,m,operatingSystem)
else null end as NewUniqueVisitorsMonthlyOS
当我拆分查询并逐个运行每个部分时,它们都运行良好。 但是,我不想将查询拆分成多个,因为我需要在一个最终的BQ视图中包含所有字段。
有什么办法可以解决这个错误吗?
UPD: 这是一个查询示例。当我添加更多字段时,它将停止处理上述错误。
SELECT
distinct
Date
,channelGrouping
,country
,browser
,deviceCategory
,operatingSystem
#Visits by all dimensions
,count(distinct concat(fullvisitorid,cast(visitid as string)))
over (partition by concat(Y,m,d),channelGrouping,country,browser,deviceCategory,operatingSystem)
as Visits
#Daily Users Browser
,case when 1 = ROW_NUMBER() over (partition by Y,m,d,browser)
then count (distinct fullvisitorid)
over (partition by Y,m,d,browser)
else null end as UniqueVisitorsDailyBrowser
#Weekly New Users
,case when 1 = ROW_NUMBER() over (partition by Y,U,channelGrouping)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,U,channelGrouping)
else null end as NewUniqueVisitorsWeeklyChannel
#Monthly New Users
,case when 1 = ROW_NUMBER() over (partition by Y,m,operatingSystem)
then count(distinct case when IsNewVisit = 1 then fullvisitorid else null end)
over (partition by Y,m,operatingSystem)
else null end as NewUniqueVisitorsMonthlyOS
FROM GA_Export_Schema
答案 0 :(得分:0)
当我需要使用单个查询回答几个不同的问题时,我通常会做的是尝试使用一些UNION ALL
操作和一些键来区分数据。
我根据您的查询在我们的数据集中测试了此查询:
SELECT
date date,
country country,
channel channel,
browser browser,
cat cat,
os os,
MAX(all_keys_visits) all_keys_visits,
MAX(browser_visits) browser_visits,
MAX(week_new_channel_users) week_new_channel_users,
MAX(month_new_os_users) month_new_os_users from(
SELECT
date,
country,
channel,
browser,
cat,
os,
visits AS all_keys_visits,
MAX(CASE
WHEN tmp = 'browser' THEN visits END) OVER(PARTITION BY browser) browser_visits, MAX(CASE
WHEN tmp = 'weekly_new_users' THEN visits END) OVER(PARTITION BY channel, week_date) week_new_channel_users,
MAX(CASE
WHEN tmp = 'monthly_new_users' THEN visits END) OVER(PARTITION BY os, month_date) month_new_os_users from(
SELECT
tmp,
date,
week_date,
month_date,
country,
channel,
browser,
cat,
os,
visits FROM (
SELECT
'all_visitors' AS tmp,
date,
FORMAT_DATE("%W", parse_DATE('%Y%m%d', date)) week_date,
FORMAT_DATE("%m", parse_DATE('%Y%m%d', date)) month_date,
geonetwork.country country,
channelGrouping channel,
device.browser browser,
device.devicecategory cat,
device.operatingSystem os,
COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitid AS string))) visits
FROM `project_id.dataset_id.ga_sessions*`
WHERE 1 = 1
AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY)) GROUP BY tmp, date, channel, country, browser, cat, os )
UNION ALL (
SELECT
'browser' AS tmp,
date,
FORMAT_DATE("%W", parse_DATE('%Y%m%d', date)) week_date,
FORMAT_DATE("%m", parse_DATE('%Y%m%d', date)) month_date,
'' country,
'' channel,
device.browser browser,
'' cat,
'' os,
COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitid AS string))) visits
FROM `project_id.dataset_id.ga_sessions*` WHERE 1 = 1 AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)) AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
GROUP BY tmp, date, channel, country, browser, cat, os )
UNION ALL (
SELECT
'weekly_new_users' AS tmp,
date,
FORMAT_DATE("%W", parse_DATE('%Y%m%d', date)) week_date,
FORMAT_DATE("%m", parse_DATE('%Y%m%d', date)) month_date,
'' country,
channelGrouping channel,
'' browser,
'' cat,
'' os,
COUNT(DISTINCT CASE
WHEN totals.newVisits = 1 THEN CONCAT(fullvisitorid, CAST(visitid AS string)) END) visits
FROM `project_id.dataset_id.ga_sessions*`
WHERE
1 = 1
AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
GROUP BY
tmp,
date,
channel,
country,
browser,
cat,
os )
UNION ALL (
SELECT
'monthly_new_users' AS tmp,
date,
FORMAT_DATE("%W", parse_DATE('%Y%m%d',
date)) week_date,
FORMAT_DATE("%m", parse_DATE('%Y%m%d',
date)) month_date,
'' country,
'' channel,
'' browser,
'' cat,
device.operatingSystem os,
COUNT(DISTINCT
CASE
WHEN totals.newVisits = 1 THEN CONCAT(fullvisitorid, CAST(visitid AS string)) END) visits
FROM
`project_id.dataset_id.ga_sessions*`
WHERE
1 = 1
AND REGEXP_EXTRACT(_table_suffix, r'.*_(.*)') BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE("%Y%m%d", DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
GROUP BY
tmp,
date,
channel,
country,
browser,
cat,
os ) ) )
GROUP BY
date,
country,
channel,
browser,
cat,
os
HAVING
(country != ''
AND channel != ''
AND browser != ''
AND cat != ''
AND os != '')
基本上在每个UNION上我创建了一个密钥,后来我根据该密钥和你想要分析的值进行了聚合。之后,我刚刚删除了创建为空字符串的字段。
我尝试处理30天的数据,这些数据消耗了几个千兆,但仍然在不到20秒的时间内完成,所以它也可能适用于你(注意这里通过单独运行工会然后聚合以后你最终工作更小的数据量,避免资源耗尽。)