需要BigQuery Tier 20或更高版本

时间:2017-01-19 15:41:52

标签: sql google-bigquery

我正在尝试在BigQuery中运行以下查询:

SELECT 
    FORMAT_TIMESTAMP('%Y-%m-%d', TIMESTAMP_MICROS(date)) as target,
    SUM(CASE WHEN period = 7  THEN users END) as days_07,
    SUM(CASE WHEN period = 14 THEN users END) as days_14,
    SUM(CASE WHEN period = 30 THEN users END) as days_30
FROM (
    SELECT 
        activity.date as date,
        periods.period as period,
        COUNT(DISTINCT user) as users
    FROM (
        SELECT
            event.timestamp_micros as date, 
            user_dim.app_info.app_instance_id as user
        FROM `table.*` 
        CROSS JOIN 
            UNNEST(event_dim) as event  
    ) as activity
    CROSS JOIN (
        SELECT 
            event.timestamp_micros  as date
        FROM `table.*` 
        CROSS JOIN 
            UNNEST(event_dim) as event 
        GROUP BY event.timestamp_micros
    ) as dates
    CROSS JOIN (
        SELECT period 
        FROM 
            (
                SELECT 7 as period 
                UNION ALL 
                SELECT 14 as period 
                UNION ALL
                SELECT 30 as period
            )
    ) as periods
    WHERE 
        dates.date >= activity.date 
    AND 
        SAFE_CAST(FLOOR(TIMESTAMP_DIFF(TIMESTAMP_MICROS(dates.date), TIMESTAMP_MICROS(activity.date), DAY)/periods.period) AS INT64) = 0
    GROUP BY 1,2
)
GROUP BY date
ORDER BY date DESC

它正在工作,如果我在一个表上运行它,将在特定时间范围内选择活动用户,但在我的实际应用程序中,我将在我的所有数据集(40+)上运行它。当我尝试在包含所有表dataset.*的单个数据集上运行它时,我收到此错误:

  

查询超出了第1层的资源限制。需要第20层或更高层。

我不确定我现在能做什么。我想,为了性能,我可能不得不最终将其移动到代码而不是SQL。

1 个答案:

答案 0 :(得分:3)

我认为我认为此查询的原因是CPU昂贵,因此它被“提升”到高计费层

原因是子选择日期和活动有大量行,因为每行代表以微秒为单位的时间戳,因此根本没有预先分组

所以,我建议转换到

FROM (
    SELECT
        event.timestamp_micros as date, 
        user_dim.app_info.app_instance_id as user
    FROM `table.*` 
    CROSS JOIN 
        UNNEST(event_dim) as event  
) as activity  

进入

    FROM (
      SELECT DISTINCT
          DATE(TIMESTAMP_MICROS(event.timestamp_micros))  AS DATE, 
          user_dim.app_info.app_instance_id AS user
      FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607` 
      CROSS JOIN UNNEST(event_dim) AS event  
    ) AS activity

分别在

之下
CROSS JOIN (
    SELECT 
        event.timestamp_micros  as date
    FROM `table.*` 
    CROSS JOIN 
        UNNEST(event_dim) as event 
    GROUP BY event.timestamp_micros
) as dates

进入

    CROSS JOIN (
        SELECT DATE(TIMESTAMP_MICROS(event.timestamp_micros))  AS DATE
        FROM `firebase-analytics-sample-data.android_dataset.app_events_20160607`
        CROSS JOIN UNNEST(event_dim) AS event 
        GROUP BY 1
    ) AS dates

以上更改会使行数更低,因此CROSS JOIN不会那么昂贵

当然比你需要分别修改你的查询的其他部分以适应现在日期字段实际上是DATE类型而不是微秒的事实

希望这有帮助!