我正在学习有关SQL的知识,并且正在做一个名为“ Codecademy上的World Populations SQL Practice”的练习。一个表包含三列:国家,人口和年份。我有兴趣计算每年增长率最高的国家。 (这不是Codecademy建议的,我只是认为这是一个有趣的想法。)
我可以使用此查询计算所有同比增长率:
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth;
并且我可以使用以下查询来计算特定年份(例如2005)的最大同比增长率:
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year = 2005
AND year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth DESC
LIMIT 1;
如果使用python,我可以使用保存为yoy_query的第一个查询来解决此问题:
yoy_result = c.execute(yoy_query).fetchall()
sorted([record for record in yoy_result if record[1] == max([row[1] for row in yoy_result if row[2] == record[2]])],key=lambda x:x[2])
我得到了预期的结果:
[('Montserrat', 7.34177215189872, 2000), ('Montserrat', 13.4433962264151, 2001), ('Afghanistan', 5.803891762260126, 2002), ('Montserrat', 10.467706013363028, 2003), ('Liberia', 4.7976709085316545, 2004), ('Jordan', 7.088496587486171, 2005), ('Jordan', 6.764378108744186, 2006), ('Montserrat', 12.638580931263864, 2007), ('Liberia', 4.157111008408977, 2008), ('Niger', 3.737166190281749, 2009)]
但是我想不出一种使用SQL的方法。有任何想法吗?我认为在python中似乎更容易的原因是因为我能够保存中间结果,然后对此进行第二次计算。
答案 0 :(得分:1)
您可以使用window functions LAG()
和RANK()
来实现:
select country, year_on_year_growth, year
from (
select *, rank() over (partition by year order by year_on_year_growth desc) as rnk
from (
select *,
100.0 * (population / lag(population) over (partition by country order by year) - 1) as year_on_year_growth
from population_years
)
)
表达式:
lag(population) over (partition by country order by year)
返回上一年的国家人口(假设年份之间没有差距)。
所以我将增长率计算为:
(((当年的人口)/(上一年的人口))-1
答案 1 :(得分:0)
我想最简单的事情实际上是只使用如下视图:
CREATE VIEW yoy_growth
AS
SELECT country,
100.0 * ((SELECT population FROM population_years AS p2
WHERE p2.year = p1.year + 1
AND p2.country = p1.country)
- population) / population AS year_on_year_growth,
year
FROM population_years AS p1
WHERE year_on_year_growth IS NOT NULL
ORDER BY year_on_year_growth;
SELECT * FROM yoy_growth AS y1
WHERE year_on_year_growth = (
SELECT MAX(year_on_year_growth)
FROM yoy_growth AS y2
WHERE y1.year = y2.year
)
ORDER BY year;
通过这种方式,我可以获得所需的结果,尽管查询似乎确实有点慢。