MySQL distinct查询返回具有重复信息的行,需要重复数据删除

时间:2011-11-29 04:22:06

标签: mysql deduplication

我在MySQL数据库中有一个类似于下面的表:

+----------+----------+----------+----------+----------+  
| Column_A | Column_B | Column_C | Column_D | Column_E |      
+----------+----------+----------+----------+----------+      
|        1 |       11 | a        |        0 | abc      |      
|        2 |       22 | a        |        0 | abc      |      
|        3 |       33 | a        |        0 | def      |      
|        4 |       44 | b        |        0 | def      |      
|        5 |          | b        |        0 | def      |      
|        6 |       55 | c        |        0 | ghi      |      
|        7 |          | d        |        0 | jkl      |      
|        8 |          | a        |        4 | abc      |      
|        9 |          | a        |        4 | abc      |      
|       10 |          | b        |        4 | abc      |      
|       11 |       88 | f        |        4 | xyz      |      
|       12 |          | f        |        4 | xyz      |      
+----------+----------+----------+----------+----------+      

我需要一个与下面类似的结果(即只有一个& b值具有不同的列D& E值):

+----------+----------+----------+
| Column_C | Column_D | Column_E |
+----------+----------+----------+
| a        |        0 | abc      |
| a        |        0 | def      |
| a        |        4 | abc      |
| b        |        0 | def      |
| b        |        4 | abc      |
+----------+----------+----------+

我试过这个问题:

SELECT DISTINCT column_c,column_d,column_e FROM trial2 ORDER BY column_c;

我明白了:

+------------------+------------------+------------------+
|     column_c     |     column_d     |     column_e     |
+------------------+------------------+------------------+
|     a            |            0     |     abc          |
|     a            |            0     |     def          |
|     a            |            4     |     abc          |
|     b            |            0     |     def          |
|     b            |            4     |     abc          |
|     c            |            0     |     ghi          |
|     d            |            0     |     jkl          |
|     f            |            4     |     xyz          |
+------------------+------------------+------------------+

我不需要在column_c中使用'c','d'或'f'的行。我需要同时具有0和0的行。 column_d中的4个值(即column_c为'a'或'b')。

3 个答案:

答案 0 :(得分:1)

您无需加入......

SELECT column_c,column_d,column_e FROM trial2 
GROUP by column_c, column_d, column_e 
HAVING count (*) > 1 
ORDER BY column_c

在应用聚合之后,having子句会运行,因此您可以过滤分组后剩余的行数...

答案 1 :(得分:0)

DISTINCT仅确保输出中最多出现一行。它不会删除与其他行不完全匹配的行。

要一次操作多行,您需要一个内连接:

SELECT t.C, t.D, t.E
  FROM trial2 AS t
    JOIN trial2 AS tb
      ON t.C=tb.C AND (t.D != tb.D OR t.E != t.E)
  GROUP BY t.C, t.D, t.E
  ORDER BY t.C;

内部联接会筛选出没有匹配行的行。在上面的查询中,匹配的行是列C的值相同但在列D或E中不同。

答案 2 :(得分:0)

对此解决方案不太确定,但我认为它可以满足您的需求。

mysql> select * from randdata;
+------+------+------+
| a    | b    | c    |
+------+------+------+
| a    | 0    | f    |
| a    | 2    | x    |
| b    | 2    | x    |
| c    | 0    | f    |
+------+------+------+
4 rows in set (0.00 sec)

mysql> select * from randdata GROUP BY concat(b,c);
+------+------+------+
| a    | b    | c    |
+------+------+------+
| a    | 0    | f    |
| a    | 2    | x    |
+------+------+------+
2 rows in set (0.01 sec)

查询:

select * from trial2 GROUP BY concat(column_d,column_e);