Question

我有下表叫做genkeyword：

---------------------------------------------------------------------------
|  id     |      title           |   genre       | keyword      |    year |
----------------------------------------------------------------------------
| 315     |  Harry Potter        |   drama       | magic        |   2011  |
| 315     |  Harry Potter        |   mystery     | magic        |   2011  |
| 315     |  Harry Potter        |   adventure   | magic        |   2011  |
| 315     |  Harry Potter        |   fantasy     | magic        |   2011  |
| 315     |  Harry Potter        |   drama       | witch        |   2011  |
| 315     |  Harry Potter        |   mystery     | witch        |   2011  |
| 315     |  Harry Potter        |   adventure   | witch        |   2011  |
| 315     |  Harry Potter        |   fantasy     | witch        |   2011  |
| 407     |  Cinderella          |   fantasy     | prince       |   2015  |
| 407     |  Cinderella          |   drama       | prince       |   2015  |
| 407     |  Cinderella          |   fantasy     | prince       |   2015  |
| 407     |  Cinderella          |   drama       | prince       |   2015  |
| 826     |  The Shape of Water  |   horror      | scientist    |   2017  |
| 826     |  The Shape of Water  |   adventure   | scientist    |   2017  |
| 826     |  The Shape of Water  |   thriller    | scientist    |   2017  |
| 826     |  The Shape of Water  |   drama       | scientist    |   2017  |
| 826     |  The Shape of Water  |   horror      | friendship   |   2017  |
| 826     |  The Shape of Water  |   adventure   | friendship   |   2017  |
| 826     |  The Shape of Water  |   thriller    | friendship   |   2017  |
| 826     |  The Shape of Water  |   drama       | friendship   |   2017  |
---------------------------------------------------------------------------

我有以下查询，该查询获取上表中每部电影与哈利·波特相同的所有流派的频率：

select title, year, count(distinct genre) as genre_freq from genkeyword
where genre in (select genre from genkeyword where title='Harry Potter') and 
title <> 'Harry Potter' group by 
title, year order by genre_freq desc;

输出应为：

--------------------------------------------------
| title                |    year   |    genre_freq |
---------------------------------------------------
| Cinderella           |    2015   |      2        |
| The Shape of Water   |    2017   |      2        |
----------------------------------------------------

但是，我在准确了解查询中count（distinct genre）的工作方式时遇到了麻烦。我知道SELECT DISTINCT仅返回不同的值，并从结果中消除重复的记录。我不确定count（distinct genre）实际上何时删除重复的记录。我真的很想了解查询在后台执行的操作。

到目前为止我所知道的：

对于genkeyword中的每个元组：

“其中的流派（从genkeyword中选择流派，其中title ='Harry Potter'）”，检索所有流派，其中genre属性的值是Harry Potter中的流派。
如果正在考虑的元组中的体裁在where子句返回的结果集中，则按count（distinct genre）计数。同样，被考虑的元组中的电影值不能是哈利·波特，否则就不会被计算在内。

但是，计数（独特类型）何时真正删除重复项？任何见解都会受到赞赏。

Answer 1

简而言之，COUNT(DISTINCT [Colnum])会执行DISTINCT来删除COUNT之前的重复的colnum值。

根据您的样本数据和查询条件。

| title              | genre     | year |
| ------------------ | --------- | ---- |
| Cinderella         | fantasy   | 2015 |
| Cinderella         | drama     | 2015 |
| Cinderella         | fantasy   | 2015 |
| Cinderella         | drama     | 2015 |
| The Shape of Water | adventure | 2017 |
| The Shape of Water | drama     | 2017 |
| The Shape of Water | adventure | 2017 |
| The Shape of Water | drama     | 2017 |

使用count(distinct genre)时，您将删除重复的genre。

您可以得到count这样的结果。

| title              | year | genre     |
| ------------------ | ---- | --------- |
| Cinderella         | 2015 | fantasy   |
| Cinderella         | 2015 | drama     |
| The Shape of Water | 2017 | adventure |
| The Shape of Water | 2017 | drama     |

因此，使用查询时您将获得帮助。

| title                |    year   |    genre_freq  |
 ----------------------|-----------|----------------|
| Cinderella           |    2015   |      2         |
| The Shape of Water   |    2017   |      2         |

SQL：了解SELECT DISTINCT如何消除重复项

1 个答案: