Question

如果我有一个包含重复ID的表格，如果我使用GROUP BY id，我会得到相同的结果，就像我使用SELECT DISTINCT(id)一样，对吗？

那么我什么时候应该选择另一种选择？

Answer 1

如果您需要汇总功能，则应使用GROUP BY，例如SUM，MAX等。

如果您只需要对列进行分组，则它们是相同的（并使用相同的计划）。

请注意DISTINCT不是函数，因此该子句：

SELECT DISTINCT(id), othercol

与

相同（列顺序除外）

SELECT DISTINCT othercol, (id)

或只是

SELECT DISTINCT othercol, id

如果有相同id但id不同的记录，

可能仍会在othercol上提供重复。

Answer 2

DISTINCT和GROUP BY通常会生成相同的查询计划，因此两个查询结构的性能应该相同。应使用GROUP BY将聚合运算符应用于每个组。如果你只需要删除重复项，那么使用DISTINCT。如果您正在使用子查询，那么该查询的执行计划会有所不同，因此在这种情况下您需要先检查执行计划，然后再决定哪个更快。

Example of DISTINCT:
 SELECT DISTINCT Employee, Rank
 FROM Employees

Example of GROUP BY:
 SELECT Employee, Rank
 FROM Employees
 GROUP BY Employee, Rank

Example of GROUP BY with aggregate function:
 SELECT Employee, Rank, COUNT(*) EmployeeCount
 FROM Employees
 GROUP BY Employee, Rank

参考：Pinal Dave（http://blog.SQLAuthority.com）

Answer 3

只是额外的信息：

如果要查询索引字段并使用LIMIT，最好使用GROUP BY而不是DISTINCT，因为它将使用索引，而不是临时表

请参阅以下链接：

http://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html
http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html

“如果存在ORDER BY子句和不同的GROUP BY子句，或者ORDER BY或GROUP BY包含连接队列中第一个表以外的表中的列，则会创建一个临时表”

示例：

MariaDB [my_db]> EXPLAIN SELECT DISTINCT p.data_prefix FROM my_table p;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | p     | range | NULL          | data_prefix | 33      | NULL |   18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

MariaDB [my_db]> EXPLAIN SELECT DISTINCT p.data_prefix FROM my_table p limit 0,40;
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                                     |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
|    1 | SIMPLE      | p     | range | NULL          | data_prefix | 33      | NULL |   18 | Using index for group-by; Using temporary |
+------+-------------+-------+-------+---------------+------------+---------+------+------+-------------------------------------------+
1 row in set (0.00 sec)

MariaDB [my_db]> EXPLAIN SELECT p.data_prefix FROM my_table p group by p.data_prefix;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | p     | range | NULL          | data_prefix | 33      | NULL |   18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

MariaDB [my_db]> EXPLAIN SELECT p.data_prefix FROM my_table p group by p.data_prefix limit 0,40;
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
| id   | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra                    |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | p     | range | NULL          | data_prefix | 33      | NULL |   18 | Using index for group-by |
+------+-------------+-------+-------+---------------+------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

MariaDB [my_db]>

Answer 4

与group by相比，您更喜欢distinct的示例。考虑一种场景，其中window function（不一定是row_number（））需要应用于不同的结果集。遵守操作顺序，您必须使用distinct

select id, row_number() over (order by id) as rn
from (select distinct id from my_table) t;

无需使用group by

的子查询就可以实现相同的目的

select id, row_number() over (order by id) as rn 
from my_table
group by id;

之所以可行，是因为window functions在group by之后但在distinct之前应用

GROUP BY x与DISTINCT（x）

4 个答案: