我正在尝试在BigQuery中运行以下查询:
SELECT
FORMAT_TIMESTAMP('%Y-%m-%d', TIMESTAMP_MICROS(date)) as target,
SUM(CASE WHEN period = 7 THEN users END) as days_07,
SUM(CASE WHEN period = 14 THEN users END) as days_14,
SUM(CASE WHEN period = 30 THEN users END) as days_30
FROM (
SELECT
activity.date as date,
periods.period as period,
COUNT(DISTINCT user) as users
FROM (
SELECT
event.timestamp_micros as date,
user_dim.app_info.app_instance_id as user
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event
) as activity
CROSS JOIN (
SELECT
event.timestamp_micros as date
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event
GROUP BY event.timestamp_micros
) as dates
CROSS JOIN (
SELECT period
FROM
(
SELECT 7 as period
UNION ALL
SELECT 14 as period
UNION ALL
SELECT 30 as period
)
) as periods
WHERE
dates.date >= activity.date
AND
SAFE_CAST(FLOOR(TIMESTAMP_DIFF(TIMESTAMP_MICROS(dates.date), TIMESTAMP_MICROS(activity.date), DAY)/periods.period) AS INT64) = 0
GROUP BY 1,2
)
GROUP BY date
ORDER BY date DESC
它正在工作,如果我在一个表上运行它,将在特定时间范围内选择活动用户,但在我的实际应用程序中,我将在我的所有数据集(40+)上运行它。当我尝试在包含所有表dataset.*
的单个数据集上运行它时,我收到此错误:
查询超出了第1层的资源限制。需要第20层或更高层。
我不确定我现在能做什么。我想,为了性能,我可能不得不最终将其移动到代码而不是SQL。
答案 0 :(得分:3)
我认为我认为此查询的原因是CPU昂贵,因此它被“提升”到高计费层
原因是子选择日期和活动有大量行,因为每行代表以微秒为单位的时间戳,因此根本没有预先分组
所以,我建议转换到
FROM (
SELECT
event.timestamp_micros as date,
user_dim.app_info.app_instance_id as user
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event
) as activity
进入
FROM (
SELECT DISTINCT
DATE(TIMESTAMP_MICROS(event.timestamp_micros)) AS DATE,
user_dim.app_info.app_instance_id AS user
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
CROSS JOIN UNNEST(event_dim) AS event
) AS activity
分别在
之下CROSS JOIN (
SELECT
event.timestamp_micros as date
FROM `table.*`
CROSS JOIN
UNNEST(event_dim) as event
GROUP BY event.timestamp_micros
) as dates
进入
CROSS JOIN (
SELECT DATE(TIMESTAMP_MICROS(event.timestamp_micros)) AS DATE
FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
CROSS JOIN UNNEST(event_dim) AS event
GROUP BY 1
) AS dates
以上更改会使行数更低,因此CROSS JOIN不会那么昂贵
当然比你需要分别修改你的查询的其他部分以适应现在日期字段实际上是DATE类型而不是微秒的事实
希望这有帮助!