我有一个Postgresql表,其中列出了一段时间内国家及其洲的值。值可以为NULL。我想获得每个大洲随时间推移的总和,直到每个大洲都有其数据的最新日期。
这是我的桌子(view on DB Fiddle):
| continent | country | date | value | id |
| --------- | ------- | ---------- | ----- | --- |
| Europe | Germany | 2020-05-25 | 10 | 1 |
| Europe | Germany | 2020-05-26 | 11 | 2 |
| Europe | Germany | 2020-05-27 | 12 | 3 |
| Europe | Germany | 2020-05-28 | 13 | 4 |
| Europe | Italy | 2020-05-25 | 20 | 5 |
| Europe | Italy | 2020-05-26 | 21 | 6 |
| Europe | Italy | 2020-05-27 | 22 | 7 |
| Europe | Italy | 2020-05-28 | 23 | 8 |
| Europe | France | 2020-05-25 | 30 | 9 |
| Europe | France | 2020-05-26 | 31 | 10 |
| Europe | France | 2020-05-27 | 32 | 11 |
| Europe | France | 2020-05-28 | NULL | 12 |
| Africa | Congo | 2020-05-25 | 40 | 13 |
| Africa | Congo | 2020-05-26 | 41 | 14 |
| Africa | Congo | 2020-05-27 | NULL | 15 |
这就是我想要回来的。请注意,欧洲包括截至27日的数据,因为法国没有28日的数据,而非洲则包括26日的非洲,因为这是其国家/地区提供数据的最后日期。
| continent | date | value |
| --------- | ---------- | ----- |
| Europe | 2020-05-27 | 66 |
| Africa | 2020-05-26 | 41 |
| Europe | 2020-05-26 | 63 |
| Africa | 2020-05-25 | 40 |
| Europe | 2020-05-25 | 60 |
我设法通过列出每个日期都有数据的每个大洲的国家/地区来实现这一目标。
SELECT
countries.continent,
countries.date,
SUM(countries.value) AS value,
COUNT(countries.country) AS countries_count
FROM
countries
WHERE
countries.value IS NOT NULL
GROUP BY
countries.continent,
countries.date
ORDER BY
countries.date DESC,
countries.continent;
| continent | date | value | countries_count |
| --------- | ---------- | ----- | --------------- |
| Europe | 2020-05-28 | 36 | 2 |
| Europe | 2020-05-27 | 66 | 3 |
| Africa | 2020-05-26 | 41 | 1 |
| Europe | 2020-05-26 | 63 | 3 |
| Africa | 2020-05-25 | 40 | 1 |
| Europe | 2020-05-25 | 60 | 3 |
我还设法获得了每个大洲的国家/地区数量。
SELECT
countries.continent,
COUNT(DISTINCT countries.country) as number_of_countries
FROM
countries
GROUP BY
countries.continent;
| continent | number_of_countries |
| --------- | ------------------- |
| Africa | 1 |
| Europe | 3 |
我坚持如何结合两个查询以过滤出没有获得该大陆国家完整数量的行(例如,选择{{1}为countries_count
1}}和3
代表Europe
。
这是我想得到的最终结果:
1
或者也许有一种完全不同的方式来解决这个问题?
答案 0 :(得分:1)
您可以在WHERE子句中使用NOT IN
:
SELECT
c.continent,
c.date,
SUM(c.value) AS value,
COUNT(DISTINCT c.country) AS countries_count
FROM countries c
WHERE date NOT IN
( SELECT date
FROM countries
WHERE value IS NULL )
GROUP BY c.continent, c.date
ORDER BY c.date DESC, c.continent;
答案 1 :(得分:1)
您可以将大陆上的国家数量与每个日期的可用数量进行比较-然后只使用两者匹配的日期(“完整数据”)即可。
不幸的是,Postgres不支持count(distinct)
作为窗口函数。但您可以这样做:
SELECT c.continent, c.date,
SUM(c.value) AS value,
COUNT(c.country) AS countries_count
FROM (SELECT c.*,
COUNT(*) OVER (PARTITION BY continent, date) as num_on_date
FROM countries c
WHERE value IS NOT NULL
) c JOIN
(SELECT continent, COUNT(DISTINCT country) as num_countries
FROM countries
GROUP BY continent
) cc
ON cc.continent = c.continent
WHERE num_on_date = num_countries
GROUP BY c.continent, c.date
ORDER BY c.date DESC, c.continent;
Here是db <>小提琴。
您也可以在HAVING
子句中使用过滤器来做到这一点:
SELECT c.continent, c.date,
SUM(c.value) AS value,
COUNT(c.country) AS countries_count
FROM countries c
WHERE value IS NOT NULL
GROUP BY c.continent, c.date
HAVING COUNT(*) = (SELECT COUNT(DISTINCT c2.country)
FROM countries c2
WHERE c2.continent = c.continent
)
ORDER BY c.date DESC, c.continent;
进行汇总,然后仅保留行数与国家/地区数量匹配的行。
答案 2 :(得分:0)
您可以使用having
子句进行过滤,以排除任何国家位于null
的组
SELECT
continent,
date,
SUM(value) AS value
FROM countries
GROUP BY continent, date
HAVING BOOL_AND(value is not null)
ORDER BY date DESC, continent
答案 3 :(得分:0)
具有SUM()
窗口功能:
select distinct c.continent, c.date,
sum(c.value) over (partition by c.continent, c.date) "value"
from countries c
where not exists (
select 1 from countries
where continent = c.continent and date = c.date and value is null
)
order by c.date desc, c.continent;
请参见demo。
结果:
| continent | date | value |
| --------- | ------------------------ | ----- |
| Europe | 2020-05-27T00:00:00.000Z | 66 |
| Africa | 2020-05-26T00:00:00.000Z | 41 |
| Europe | 2020-05-26T00:00:00.000Z | 63 |
| Africa | 2020-05-25T00:00:00.000Z | 40 |
| Europe | 2020-05-25T00:00:00.000Z | 60 |