Question

我有一个看起来像这样的表：

control=# select * from animals;
 age_range | weight | species
-----------+--------+---------
 0-9       |      1 | lion
 0-9       |      2 | lion
 10-19     |      2 | tiger
 10-19     |      3 | horse
 20-29     |      2 | tiger
 20-29     |      2 | zebra

我执行一个查询，总结年龄范围组内动物的权重，我只想返回上面聚合权重的行一定数量。

摘要查询：

SELECT
 age_range,
 SUM(animals.weight) AS weight,
 COUNT(DISTINCT animals.species) AS distinct_species
FROM animals
GROUP BY age_range
HAVING SUM(animals.weight) > 3;

摘要结果：

 age_range | weight | distinct_species
-----------+--------+------------------
 10-19     |      5 |                2
 20-29     |      4 |                2

现在，这就是问题所在。除了这个总结，我想报告用于创建上述汇总行集的物种的不同数量。为简单起见，我们将此数字称为“与众不同的物种总数”。在这个简单的例子中，由于只使用了3种（虎，斑马，马）来产生这一摘要的2行，而不是“狮子”，“不同的物种总数”＃39;应该是3.但我无法弄清楚如何成功查询该数字。由于摘要查询必须使用having子句才能将过滤器应用于已分组和聚合的行集，因此在尝试查询“Distinct Species Total”时会出现问题。

这会返回错误的数字2，因为它错误地是非重复计数的唯一计数：

SELECT
 COUNT(DISTINCT distinct_species) AS distinct_species_total
FROM (
 SELECT
  age_range,
  SUM(animals.weight) AS weight,
  COUNT(DISTINCT animals.species) AS distinct_species
 FROM animals
 GROUP BY age_range
 HAVING SUM(animals.weight) > 3
) x;

当然这会返回错误的数字4，因为它不考虑使用having子句过滤分组和聚合的摘要结果：

SELECT
 COUNT(DISTINCT species) AS distinct_species_total
FROM animals;

在这里引导我走上正确道路的任何帮助都表示赞赏，并希望能帮助其他类似问题的人，但最终我确实需要一个适用于Amazon Redshift的解决方案。

Answer 1

将结果集与原始动物表一起加入并计算不同的物种。

select distinct x.age_range,x.weight,count(distinct y.species) as distinct_species_total
from 
(
     select age_range,sum(animals.weight) as weight
     from animals
     group by age_range
     having sum(animals.weight) > 3
) x
join animals y on x.age_range=y.age_range

查询用于创建分组，聚合和过滤行集的非重复计数

1 个答案: