日,类别汇总的最大值?

时间:2018-08-17 22:12:20

标签: sql google-bigquery

目标:仅在2018年的一周中每一天(唯一)输出(类别,事件总数)。

即显示 top 类别及其一周中每一天的事件总数。因此,该数据集应该只有7行(但是LIMIT 7并不能真正回答我试图理解的核心问题)

使用BigQuery标准SQL:

SELECT
    dayofweek,
    category,
    SUM(incident_count) as incidents
FROM
    (SELECT dayofweek, category, count(*) as incident_count
    FROM `bigquery-public-data.san_francisco.sfpd_incidents`
    WHERE
        EXTRACT(year from timestamp) = 2018
    GROUP BY
        category, dayofweek
) incidents_2018
GROUP BY
    category,
    dayofweek
ORDER BY incidents DESC

我尝试编写带有子查询的HAVING子句以过滤汇总结果-类似于HAVING incidents > (SELECT count(*) FROM sfpd_incidents WHERE ...)-但我一直在理解该子查询的外观。< / p>

|dayofweek|         category          |incidents|
|---------|---------------------------|--------:|
|Monday   |LARCENY/THEFT              |      228|
|Wednesday|LARCENY/THEFT              |      210|
|Tuesday  |LARCENY/THEFT              |      194|
|Thursday |LARCENY/THEFT              |      119|
|Friday   |LARCENY/THEFT              |      118|
|Saturday |LARCENY/THEFT              |      115|
|Sunday   |LARCENY/THEFT              |      108|
# this should be the cut-off point - only show the
# top category & its count for each dayofweek
# incident counts that aren't the "top" for each
# dayofweek should be excluded from the result set.
|Monday   |NON-CRIMINAL               |      105|
|Tuesday  |OTHER OFFENSES             |       91|
|Wednesday|OTHER OFFENSES             |       85|
|Tuesday  |NON-CRIMINAL               |       78|
|Monday   |OTHER OFFENSES             |       72|
|Monday   |ASSAULT                    |       68|
|Wednesday|NON-CRIMINAL               |       62|
|Tuesday  |ASSAULT                    |       62|
|Wednesday|ASSAULT                    |       51|
|Sunday   |ASSAULT                    |       50|
|Thursday |ASSAULT                    |       47|

2 个答案:

答案 0 :(得分:1)

我想你想要

SELECT dayofweek, category, incident_count
FROM (SELECT dayofweek, category, count(*) as incident_count,
             ROW_NUMBER() OVER (PARTITION BY dayofweek ORDER BY COUNT(*) DESC) as seqnum
      FROM `bigquery-public-data.san_francisco.sfpd_incidents`
      WHERE EXTRACT(year from timestamp) = 2018
      GROUP BY category, dayofweek
     ) incidents_2018
WHERE seqnum = 1
ORDER BY incident_count DESC;

答案 1 :(得分:1)

BigQuery Standard SQL的另一个选项(看起来更接近原始查询,因此您可能会更好地理解它)

#standardSQL
SELECT 
  dayofweek, 
  ARRAY_AGG(
    STRUCT<category STRING, incidents INT64>(category, incident_count) 
    ORDER BY incident_count DESC 
    LIMIT 1
  )[SAFE_OFFSET(0)].*
FROM (
  SELECT dayofweek, category, COUNT(*) AS incident_count
  FROM `bigquery-public-data.san_francisco.sfpd_incidents`
  WHERE EXTRACT(year FROM TIMESTAMP) = 2018
  GROUP BY category, dayofweek
) incidents_2018
GROUP BY dayofweek
ORDER BY incidents DESC   

稍短(较不冗长的版本)是

#standardSQL
SELECT 
  ARRAY_AGG(incidents_2018 ORDER BY incident_count DESC LIMIT 1)[SAFE_OFFSET(0)].*
FROM (
  SELECT dayofweek, category, COUNT(*) AS incident_count
  FROM `bigquery-public-data.san_francisco.sfpd_incidents`
  WHERE EXTRACT(year FROM TIMESTAMP) = 2018
  GROUP BY category, dayofweek
) incidents_2018
GROUP BY incidents_2018.dayofweek
ORDER BY incident_count DESC  

两个选项-输出为

Row dayofweek   category        incident_count   
1   Monday      LARCENY/THEFT   228  
2   Wednesday   LARCENY/THEFT   210  
3   Tuesday     LARCENY/THEFT   194  
4   Thursday    LARCENY/THEFT   119  
5   Friday      LARCENY/THEFT   118  
6   Saturday    LARCENY/THEFT   115  
7   Sunday      LARCENY/THEFT   108