Big Query枢轴和聚合重复的字段

时间:2017-08-11 19:05:06

标签: google-bigquery

我想调整字段' unitId',' firebase_screen_class'这样每个人都出现在一个单独的栏目中:

SELECT
  event.name,
  event_param.value.string_value AS ad_unit,
  COUNT(*) AS event_count
FROM
  `app_events_20170510`, 
  UNNEST(event_dim) AS event, 
  UNNEST(event.params) as event_param
WHERE
  event.name in ('Ad_requested', 'Ad_clicked', 'Ad_shown')
  and event_param.key in ('unitId', 'screen_class')
GROUP BY 1,2

我使用旧版SQL使用了以下查询,但未显示正确的聚合结果:

SELECT event_name, ad_unit, count(*) FROM
(
SELECT
  event_dim.name as event_name,
  MAX(IF(event_dim.params.key = "firebase_screen_class", event_dim.params.value.string_value, NULL)) WITHIN RECORD as firebase_screen_class,
  MAX(IF(event_dim.params.key = "unitId", event_dim.params.value.string_value, NULL)) WITHIN RECORD as ad_unit
FROM
  [app_events_20170510]
WHERE
  event_dim.name in ('Ad_requested','Ad_shown', 'Ad_clicked')
  and event_dim.params.key in ('unitId','screen_class')
)
group by 1,2

我正在寻找以下输出:

_________________________________________________________________________
| event_dim.name   | unitId         | screen_class         | count_events|
|__________________|________________|______________________|_____________|
| Ad_requested     | hpg            | socialFeed           |    520      |
|__________________|________________|______________________|_____________|
| Ad_shown         | hpg            | chat                 |    950      |
|__________________|________________|______________________|_____________|
| Ad_requested     | hni            | chat                 |    740      |
|__________________|________________|______________________|_____________|

所有事件Ad_requestedAd_shownAd_clicked的参数都具有相同的键(unitIdscreen_class),并且每个键的值也相同( unitIdhpghni / screen_classsocialFeedchat

1 个答案:

答案 0 :(得分:5)

以下是BigQuery Standard SQL

  
#standardSQL
WITH `aggregation` AS (
  SELECT
    event.name,
    event_param.key,
    COUNT(*) AS event_count
  FROM
    `app_events_20170510`, 
    UNNEST(event_dim) AS event, 
    UNNEST(event.params) AS event_param
  WHERE
    event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
    AND event_param.key IN ('unitId', 'firebase_screen_class','house')
  GROUP BY 1, 2
)
SELECT 
  name,
  MAX(IF(key = 'unitId', event_count, NULL)) AS unitId,
  MAX(IF(key = 'firebase_screen_class', event_count, NULL)) AS firebase_screen_class,
  MAX(IF(key = 'house', event_count, NULL)) AS house
FROM `aggregation`
GROUP BY name  
  

根据评论中的说明进行更新:

#standardSQL
SELECT
  event.name,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'unitId') AS unitId,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'firebase_screen_class') AS firebase_screen_class,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'house') AS house,
  COUNT(1) AS event_count
FROM `app_events_20170510`, UNNEST(event_dim) AS event
WHERE event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
GROUP BY 1,2,3,4
  

... 出于好奇,我尝试使用旧版SQL 复制查询... -

     

为BigQuery Legacy SQL添加了版本(纯粹出于学习目的,希望能帮助那些考虑迁移到标准SQL的人,现在这里提供了相同任务的两个版本

#legacySQL
SELECT name, product_id, source, firebase_event_origin, COUNT(1) AS event_count
FROM (
  SELECT event_dim.name AS name,
    MAX(IF(event_dim.params.key = 'unitId', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS unitId,
    MAX(IF(event_dim.params.key = 'firebase_screen_class', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS firebase_screen_class,
    MAX(IF(event_dim.params.key = 'house', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS house,
  FROM FLATTEN([project:dataset.app_events_20170510], event_dim) AS event
  WHERE event_dim.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
)
GROUP BY 1, 2, 3, 4