目标:仅在2018年的一周中每一天(唯一)输出(类别,事件总数)。
即显示 top 类别及其一周中每一天的事件总数。因此,该数据集应该只有7行(但是LIMIT 7
并不能真正回答我试图理解的核心问题)
使用BigQuery标准SQL:
SELECT
dayofweek,
category,
SUM(incident_count) as incidents
FROM
(SELECT dayofweek, category, count(*) as incident_count
FROM `bigquery-public-data.san_francisco.sfpd_incidents`
WHERE
EXTRACT(year from timestamp) = 2018
GROUP BY
category, dayofweek
) incidents_2018
GROUP BY
category,
dayofweek
ORDER BY incidents DESC
我尝试编写带有子查询的HAVING
子句以过滤汇总结果-类似于HAVING incidents > (SELECT count(*) FROM sfpd_incidents WHERE ...)
-但我一直在理解该子查询的外观。< / p>
|dayofweek| category |incidents|
|---------|---------------------------|--------:|
|Monday |LARCENY/THEFT | 228|
|Wednesday|LARCENY/THEFT | 210|
|Tuesday |LARCENY/THEFT | 194|
|Thursday |LARCENY/THEFT | 119|
|Friday |LARCENY/THEFT | 118|
|Saturday |LARCENY/THEFT | 115|
|Sunday |LARCENY/THEFT | 108|
# this should be the cut-off point - only show the
# top category & its count for each dayofweek
# incident counts that aren't the "top" for each
# dayofweek should be excluded from the result set.
|Monday |NON-CRIMINAL | 105|
|Tuesday |OTHER OFFENSES | 91|
|Wednesday|OTHER OFFENSES | 85|
|Tuesday |NON-CRIMINAL | 78|
|Monday |OTHER OFFENSES | 72|
|Monday |ASSAULT | 68|
|Wednesday|NON-CRIMINAL | 62|
|Tuesday |ASSAULT | 62|
|Wednesday|ASSAULT | 51|
|Sunday |ASSAULT | 50|
|Thursday |ASSAULT | 47|
答案 0 :(得分:1)
我想你想要
SELECT dayofweek, category, incident_count
FROM (SELECT dayofweek, category, count(*) as incident_count,
ROW_NUMBER() OVER (PARTITION BY dayofweek ORDER BY COUNT(*) DESC) as seqnum
FROM `bigquery-public-data.san_francisco.sfpd_incidents`
WHERE EXTRACT(year from timestamp) = 2018
GROUP BY category, dayofweek
) incidents_2018
WHERE seqnum = 1
ORDER BY incident_count DESC;
答案 1 :(得分:1)
BigQuery Standard SQL的另一个选项(看起来更接近原始查询,因此您可能会更好地理解它)
#standardSQL
SELECT
dayofweek,
ARRAY_AGG(
STRUCT<category STRING, incidents INT64>(category, incident_count)
ORDER BY incident_count DESC
LIMIT 1
)[SAFE_OFFSET(0)].*
FROM (
SELECT dayofweek, category, COUNT(*) AS incident_count
FROM `bigquery-public-data.san_francisco.sfpd_incidents`
WHERE EXTRACT(year FROM TIMESTAMP) = 2018
GROUP BY category, dayofweek
) incidents_2018
GROUP BY dayofweek
ORDER BY incidents DESC
稍短(较不冗长的版本)是
#standardSQL
SELECT
ARRAY_AGG(incidents_2018 ORDER BY incident_count DESC LIMIT 1)[SAFE_OFFSET(0)].*
FROM (
SELECT dayofweek, category, COUNT(*) AS incident_count
FROM `bigquery-public-data.san_francisco.sfpd_incidents`
WHERE EXTRACT(year FROM TIMESTAMP) = 2018
GROUP BY category, dayofweek
) incidents_2018
GROUP BY incidents_2018.dayofweek
ORDER BY incident_count DESC
两个选项-输出为
Row dayofweek category incident_count
1 Monday LARCENY/THEFT 228
2 Wednesday LARCENY/THEFT 210
3 Tuesday LARCENY/THEFT 194
4 Thursday LARCENY/THEFT 119
5 Friday LARCENY/THEFT 118
6 Saturday LARCENY/THEFT 115
7 Sunday LARCENY/THEFT 108