不允许使用汇总汇总Bigquery

时间:2019-05-07 19:57:57

标签: sql arrays json object google-bigquery

我正在尝试查询我的Google bigquery分析表。 我感兴趣的字段是嵌套的。 我要检索的结构适合:类别>子类别>子子类别。

我尝试执行以下操作:

select 
event_param1.value.string_value AS category,
event_param2.value.string_value AS action,
ARRAY_AGG(DISTINCT event_param3.value.string_value) AS label
FROM `analytics.events_20*` AS t,
UNNEST(event_params) as event_param1,
UNNEST(event_params) as event_param2,
UNNEST(event_params) as event_param3
where
parse_date('%y%m%d', _table_suffix) between DATE_sub(current_date(), interval 30 day) and DATE_sub(current_date(), interval 1 day) AND
event_param1.key = 'category' and
event_param2.key = 'action' and
event_param3.key = 'label'
group by category, action
order by category, action

但是这将返回一行,其中包含一个类别,一个子类别以及所有子类别的数组。

我想要一行包含一个类别,所有子类别,每个子类别的所有子类别。

这是我所得到的例子:

{
    "category": "Apple Watch",
    "action": "Apple Badge Clicked",
    "label": [
      "User Landing Page",
      "Attract",
      "Guest Landing Page",
      "Guest In Workout",
      "User In Workout"
    ]
  },
  {
    "category": "Apple Watch",
    "action": "CONNECTED",
    "label": [
      "User Landing Page",
      "Attract",
      "Guest Landing Page",
      "Guest In Workout",
      "User In Workout"
    ]
  }

这就是我想要的:

{
    "category": "Apple Watch",
    "action": {
        "Apple Badge Clicked": {
            "label": [
                "User Landing Page",
                "Attract",
                "Guest Landing Page",
                "Guest In Workout",
                "User In Workout"
            ]
        },
        "CONNECTED": {
            "label": [
                "User Landing Page",
                "Attract",
                "Guest Landing Page",
                "Guest In Workout",
                "User In Workout"
            ]
        }
    }
}

如果我在另一个ARRAY_AGG中尝试使用ARRAY_AGG,则会得到Aggregations of aggregations are not allowed Bigquery。 我知道我要问的不是那么简单,但是类似的解决方案也可以。

1 个答案:

答案 0 :(得分:1)

您需要首先聚合到最高级别的阵列中。之后,您可以使用子查询重新排列数据:

这不能完全反映您想要的输出,但是可以灵活地处理各种动作类型:

WITH test AS (
  SELECT * FROM UNNEST([
    STRUCT('Apple Watch' AS category, 'Apple Badge Clicked' as action, 'User Landing Page' as label),
    ('Apple Watch','Apple Badge Clicked','Attract'),
    ('Apple Watch','Apple Badge Clicked','Guest Landing Page'),
    ('Apple Watch','CONNECTED','User Landing Page'),
    ('Apple Watch','CONNECTED','Attract'),
    ('Apple Watch','CONNECTED','User In Workout')
  ])  
),
-- first level of aggregation, prepare for fine tuning
catAgg as (
  SELECT 
    category,
    ARRAY_AGG(struct(action, label)) AS catInfo
  FROM test
  GROUP BY 1
)

SELECT 
  category,
  -- feed sub-query output into an array "action"
  array(SELECT AS STRUCT 
     action as actionType, -- re-group data within the array by field "action"
     array_agg(distinct label) as label
   FROM UNNEST(catInfo)
   GROUP BY 1
   ) as action
FROM catAgg

希望这会有所帮助