我有下表叫做genkeyword:
---------------------------------------------------------------------------
| id | title | genre | keyword | year |
----------------------------------------------------------------------------
| 315 | Harry Potter | drama | magic | 2011 |
| 315 | Harry Potter | mystery | magic | 2011 |
| 315 | Harry Potter | adventure | magic | 2011 |
| 315 | Harry Potter | fantasy | magic | 2011 |
| 315 | Harry Potter | drama | witch | 2011 |
| 315 | Harry Potter | mystery | witch | 2011 |
| 315 | Harry Potter | adventure | witch | 2011 |
| 315 | Harry Potter | fantasy | witch | 2011 |
| 407 | Cinderella | fantasy | prince | 2015 |
| 407 | Cinderella | drama | prince | 2015 |
| 407 | Cinderella | fantasy | prince | 2015 |
| 407 | Cinderella | drama | prince | 2015 |
| 826 | The Shape of Water | horror | scientist | 2017 |
| 826 | The Shape of Water | adventure | scientist | 2017 |
| 826 | The Shape of Water | thriller | scientist | 2017 |
| 826 | The Shape of Water | drama | scientist | 2017 |
| 826 | The Shape of Water | horror | friendship | 2017 |
| 826 | The Shape of Water | adventure | friendship | 2017 |
| 826 | The Shape of Water | thriller | friendship | 2017 |
| 826 | The Shape of Water | drama | friendship | 2017 |
---------------------------------------------------------------------------
我有以下查询,该查询获取上表中每部电影与哈利·波特相同的所有流派的频率:
select title, year, count(distinct genre) as genre_freq from genkeyword
where genre in (select genre from genkeyword where title='Harry Potter') and
title <> 'Harry Potter' group by
title, year order by genre_freq desc;
输出应为:
--------------------------------------------------
| title | year | genre_freq |
---------------------------------------------------
| Cinderella | 2015 | 2 |
| The Shape of Water | 2017 | 2 |
----------------------------------------------------
但是,我在准确了解查询中count(distinct genre)的工作方式时遇到了麻烦。我知道SELECT DISTINCT仅返回不同的值,并从结果中消除重复的记录。我不确定count(distinct genre)实际上何时删除重复的记录。我真的很想了解查询在后台执行的操作。
到目前为止我所知道的:
对于genkeyword中的每个元组:
但是,计数(独特类型)何时真正删除重复项?任何见解都会受到赞赏。
答案 0 :(得分:1)
简而言之,COUNT(DISTINCT [Colnum])
会执行DISTINCT
来删除COUNT
之前的重复的colnum值。
根据您的样本数据和查询条件。
| title | genre | year |
| ------------------ | --------- | ---- |
| Cinderella | fantasy | 2015 |
| Cinderella | drama | 2015 |
| Cinderella | fantasy | 2015 |
| Cinderella | drama | 2015 |
| The Shape of Water | adventure | 2017 |
| The Shape of Water | drama | 2017 |
| The Shape of Water | adventure | 2017 |
| The Shape of Water | drama | 2017 |
使用count(distinct genre)
时,您将删除重复的genre
。
您可以得到count
这样的结果。
| title | year | genre |
| ------------------ | ---- | --------- |
| Cinderella | 2015 | fantasy |
| Cinderella | 2015 | drama |
| The Shape of Water | 2017 | adventure |
| The Shape of Water | 2017 | drama |
因此,使用查询时您将获得帮助。
| title | year | genre_freq |
----------------------|-----------|----------------|
| Cinderella | 2015 | 2 |
| The Shape of Water | 2017 | 2 |