BigQuery-无法在范围聚合中使用不重复计数

时间:2018-11-25 05:53:54

标签: google-bigquery

我有一张桌子

+--------+------------------+-----------+---------+-------------+ 
|visit_id|browsed_categories            | num_seen| num_borrows |
+--------+------------------+-----------+---------+-------------+
|1       |  fiction,history             | 20      | 3           |
|2       |  selfhelp,fiction,science    | 15      | 3           |
|3       |  cooking,kids,home,selfhelp  | 7       | 2           |
+--------+------------------------------+---------+-------------+

,并且正在尝试对该表进行汇总,以查找不同的浏览类别和借贷项之间是否存在关联。

+-------------+---------------------------------+-------------------------+
| borrow_rate | num_distinct_browsed_categories | distinct_categories     | 
+-------------+---------------------------------+-------------------------+
|  0          | 3                               | cooking,selfhelp,home   |
|  1          | 2                               | history,fiction         |
+-------------+---------------------------------+-------------------------+

我的查询如下:

select
  *,
  count(distinct(split(all_cats, ','))) as num_distinct_browsed_categories
from
(
  select 
    (num_borrows/num_seen) as borrow_rate,
    count(visit_id) as num_visits,
    group_concat(browsed_categories, ',') as all_cats
  from [table]
  group by borrow_rate
)

查询给我这个错误:

Cannot use count distinct with scoped aggregation

如何修改查询以获取所需的输出?

1 个答案:

答案 0 :(得分:2)

以下是BigQuery标准SQL的版本

#standardSQL
SELECT
  *,
  (SELECT COUNT(DISTINCT cat) FROM UNNEST(SPLIT(all_cats, ',')) cat) AS num_distinct_browsed_categories  
FROM (
  SELECT 
    (num_borrows/num_seen) AS borrow_rate,
    COUNT(visit_id) AS num_visits,
    STRING_AGG(browsed_categories, ',') AS all_cats
  FROM `project.dataset.table`
  GROUP BY borrow_rate
)   

顺便说一句,如果由于某些原因您仍然绑定BigQuery旧版SQL,只需替换

count(distinct(split(all_cats, ',')))    

exact_count_distinct(split(all_cats, ','))   

在原始查询中