我正在尝试按日期提取用户活动。我正在尝试使用交叉连接和where子句创建自创建用户帐户以来每天的表。就我而言,无法避免交叉连接。日历表只是过去365天(365行)的所有日期的列表。用户表有大约1b行。
这是在资源不足的情况下失败的查询:
someCoolFunction
基于https://cloud.google.com/bigquery/query-reference,交叉连接甚至不支持“each”子句。如何执行上述操作以成功创建表?
答案 0 :(得分:3)
你不需要填空#34;空"几天计算每日计数并执行窗口功能以获得汇总金额,因此您甚至不需要日历表。要实现这一点,您需要在窗口中使用RANGE vs. ROWS。请参阅下面的示例(适用于BigQuery Standard SQL)
#standardSQL
SELECT
user_id, created, daily_count,
SUM(daily_count) OVER(
PARTITION BY user_id ORDER BY created_unix_date DESC
RANGE BETWEEN CURRENT ROW AND 6 FOLLOWING
) weekly_avg
FROM `dw.user`, UNNEST([UNIX_DATE(created)]) AS created_unix_date
ORDER BY user_id, created DESC
我不确定您的表的确切架构/类型,因此可能需要分别调整以上,但同时您可以使用以下虚拟数据进行测试/播放
#standardSQL
WITH `dw.user` AS (
SELECT
day AS created,
CAST(1 + 10 * RAND() AS INT64) AS user_id,
CAST(100 * RAND() AS INT64) AS daily_count
FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-04-26')) AS day
)
SELECT
user_id, created, daily_count,
SUM(daily_count) OVER(
PARTITION BY user_id ORDER BY created_unix_date DESC
RANGE BETWEEN CURRENT ROW AND 6 FOLLOWING
) weekly_avg
FROM `dw.user`, UNNEST([UNIX_DATE(created)]) AS created_unix_date
ORDER BY user_id, created DESC