我是SQL的初学者,这是我被要求解决的问题:
假设一个大城市被定义为类型为
place
的{{1}},其人口为 至少100,000。编写一个SQL查询,返回由city
排序的方案(state_name,no_big_city,big_city_population)
,列出那些拥有(a)至少五个大城市或(b)至少一百万人居住在大城市的州。列state_name
是state_name
的{{1}},name
是州内大城市的数量,state
是居住在此州的人数该州的大城市。
现在,据我所知,以下查询返回正确的结果:
no_big_city
但是,代码中使用的两个聚合函数出现两次。我的问题:有没有办法让这段代码重复消失,保留功能?
要清楚,我已经尝试过使用别名,但我只是得到了“列不存在”错误。
答案 0 :(得分:4)
输出列的名称可用于指代列中的值
ORDER BY
和GROUP BY
条款,但不在WHERE
或HAVING
条款中; 你必须写出表达式。
大胆强调我的。
您可以避免使用子查询或CTE重复键入长表达式:
SELECT state_name, no_big_city, big_city_population
FROM (
SELECT s.name AS state_name
, COUNT(*) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
, SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
FROM state s
JOIN place p ON s.code = p.state_code
GROUP BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
) sub
WHERE no_big_city >= 5
OR big_city_population >= 1000000
ORDER BY state_name;
在参与其中时,我简化了聚合FILTER
子句(Postgres 9.4 +):
但是,我建议这个更简单,更快速的查询开头:
SELECT s.state_name, p.no_big_city, p.big_city_population
FROM state s
JOIN (
SELECT state_code AS code -- alias just to simplify join
, count(*) AS no_big_city
, sum(population) AS big_city_population
FROM place
WHERE type = 'city'
AND population >= 100000
GROUP BY 1 -- can be ordinal number referencing position in SELECT list
HAVING count(*) >= 5 OR sum(population) >= 1000000 -- simple expressions now
) p USING (code)
ORDER BY 1; -- can also be ordinal number
我正在演示在GROUP BY
和ORDER BY
中引用表达式的另一种选择。只有在不损害可读性和可维护性的情况下才使用它。
答案 1 :(得分:1)
不确定这是评论还是答案,因为它更偏向于技术,但我会发布它
当我需要引用计算列(通常是LOT同时)时,我通常会做的是将计算列放在派生表中,然后使用派生表外的别名引用计算列。这个语法应该是ANSI-SQL正确的,但我不熟悉PostGRES
select * from (
SELECT STATE.NAME AS state_name
,COUNT(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
,SUM(CASE WHEN place.type = 'city'
AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM STATE
INNER JOIN place
ON STATE.code = place.state_code
GROUP BY state_name
) sub
where no_big_city >= 5
and big_city_population >=100000
--HAVING COUNT(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5
-- OR SUM(CASE WHEN place.type = 'city'
-- AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;
这种方法的好处是,虽然您通过子查询/派生表添加复杂性,但公式保存在一个位置,因此任何更改只需要发生一次。我不知道这是否会比简单地重复计算更糟糕,但我无法想象它会更糟糕。
答案 2 :(得分:0)
SELECT子句是您想要从WHERE子句表中选择过滤的 GROUP BY是如何对过滤记录进行分组以在SELECT中的聚合函数中使用的条件。所以别名不能存在。 但是您可以包装已过滤的记录并从中进行选择。这样的事情:
SELECT state_name, no_big_city, big_city_population
FROM
(
SELECT
state.name AS state_name,
COUNT(1) no_big_city,
MAX(place.population) max_city_population,
SUM(place.population) AS big_city_population
FROM state JOIN place ON state.code = place.state_code
WHERE
place.type = 'city' AND
place.population >= 100000
GROUP BY state.name
)
WHERE
no_big_city >= 5 OR
max_city_population > 1000000
ORDER BY state_name
此外,移动条件
place.type = 'city' AND
place.population >= 100000
从CASE到WHERE将表现更好。 “没有城市”或“小城市记录将不会被处理。特别是如果place.type列上有索引。