BigQuery交叉加入失败

时间:2017-04-26 21:19:01

标签: google-bigquery

我正在尝试按日期提取用户活动。我正在尝试使用交叉连接和where子句创建自创建用户帐户以来每天的表。就我而言,无法避免交叉连接。日历表只是过去365天(365行)的所有日期的列表。用户表有大约1b行。

这是在资源不足的情况下失败的查询:

someCoolFunction

基于https://cloud.google.com/bigquery/query-reference,交叉连接甚至不支持“each”子句。如何执行上述操作以成功创建表?

1 个答案:

答案 0 :(得分:3)

你不需要填空#34;空"几天计算每日计数并执行窗口功能以获得汇总金额,因此您甚至不需要日历表。要实现这一点,您需要在窗口中使用RANGE vs. ROWS。请参阅下面的示例(适用于BigQuery Standard SQL)

  
#standardSQL
SELECT 
  user_id, created, daily_count,
  SUM(daily_count) OVER(
    PARTITION BY user_id ORDER BY created_unix_date DESC 
    RANGE BETWEEN CURRENT ROW AND 6 FOLLOWING 
  ) weekly_avg
FROM `dw.user`, UNNEST([UNIX_DATE(created)]) AS created_unix_date
ORDER BY user_id, created DESC

我不确定您的表的确切架构/类型,因此可能需要分别调整以上,但同时您可以使用以下虚拟数据进行测试/播放

#standardSQL
WITH `dw.user` AS (
  SELECT 
    day AS created,
    CAST(1 + 10 * RAND() AS INT64)  AS user_id,
    CAST(100 * RAND() AS INT64) AS daily_count
  FROM UNNEST(GENERATE_DATE_ARRAY('2017-01-01', '2017-04-26')) AS day
)
SELECT 
  user_id, created, daily_count,
  SUM(daily_count) OVER(
    PARTITION BY user_id ORDER BY created_unix_date DESC 
    RANGE BETWEEN CURRENT ROW AND 6 FOLLOWING 
  ) weekly_avg
FROM `dw.user`, UNNEST([UNIX_DATE(created)]) AS created_unix_date
ORDER BY user_id, created DESC