我正在尝试处理BigQuery的公共数据集bigquery-public-data.austin_crime.crime
。我的目标是将输出显示为三列
犯罪的种类,数量和特定描述(犯罪)的最高地区。
我可以通过此查询获取前两列。
select
a.description,
count(*) as district_count
from `bigquery-public-data.austin_crime.crime` a
group by description order by district_count desc
并希望我可以通过一个查询来完成此操作,然后我尝试通过添加下面的代码来获得第三列,该列向我显示该特定描述(犯罪)的顶部区域
select
a.description,
count(*) as district_count,
(
select district from
( select
district, rank() over(order by COUNT(*) desc) as rank
FROM `bigquery-public-data.austin_crime.crime`
where description = a.description
group by district
) where rank = 1
) as top_District
from `bigquery-public-data.austin_crime.crime` a
group by description
order by district_count desc
我得到的错误是这个。 “不支持引用其他表的相关子查询,除非可以取消相关,例如将它们转换为有效的JOIN。”
我认为我可以通过加入来做到这一点。有人可以有更好的解决方案,而无需加入使用吗?
答案 0 :(得分:2)
以下是用于BigQuery标准SQL
#standardSQL
SELECT description,
ANY_VALUE(district_count) AS district_count,
STRING_AGG(district ORDER BY cnt DESC LIMIT 1) AS top_district
FROM (
SELECT description, district,
COUNT(1) OVER(PARTITION BY description) AS district_count,
COUNT(1) OVER(PARTITION BY description, district) AS cnt
FROM `bigquery-public-data.austin_crime.crime`
)
GROUP BY description
-- ORDER BY district_count DESC