如何在mysql中查找此场景中的重复项和空白

时间:2013-03-26 07:08:09

标签: mysql select

嗨,我有一张看起来像

的桌子
-----------------------------------------------------------
|  id  |  group_id | source_id | target_id | sortsequence |
-----------------------------------------------------------
|  2   |    1      |    2      |   4       |     1        |   
-----------------------------------------------------------
|  4   |    1      |    20     |   2       |     1        |   
-----------------------------------------------------------
|  5   |    1      |    2      |   14      |     1        |   
-----------------------------------------------------------
|  7   |    1      |    2      |   7       |     3        |   
-----------------------------------------------------------
|  20  |    2      |    20     |   4       |     3        |   
-----------------------------------------------------------
|  21  |    2      |    20     |   4       |     1        |   
-----------------------------------------------------------

方案

有两种情况需要处理。

  1. Sortsequence列值对于一个source_idgroup_id应该是唯一的。例如,如果具有group_id = 1 AND source_id = 2的所有记录都应具有唯一的sortsequence。在上面的示例记录中有id= and 5 which are having group_id = 1 and source_id = 2 have same sortsequence which is 1。这是错误的记录。我需要找出这些记录。
  2. 如果group_id and source_id相同。 sortsequence columns value should be continous. There should be no gaprecords having id = 20, 21 having same group_id and source_id and sortsequence value is 3 and 1。例如,在上表SELECT source_id,`group_id`,GROUP_CONCAT(id) AS children FROM table GROUP BY source_id, sortsequence, `group_id` HAVING COUNT(*) > 1 中。即使这是独一无二的,但在sortsequence值上存在差距。我还需要找出这些记录。
  3. 我的努力

    我写了一个查询

    By the way query will be dealing with million of records in table so performance must be very good.

    此查询仅解决方案1.如何处理方案2?有没有办法在同一个查询中执行它,或者我必须写其他来处理第二个场景。

    {{1}}

2 个答案:

答案 0 :(得分:1)

Tere J评论中得到答案。以下查询涵盖了上述两个标准。

 SELECT 
     source_id, `group_id`, GROUP_CONCAT(id) AS faultyIDS    
 FROM
     table
 GROUP BY
     source_id,group_id 
 HAVING
     COUNT(DISTINCT sortsequence) <> COUNT(sortsequence) OR COUNT(sortsequence) <> MAX(sortsequence) OR MIN(sortsequence) <> 1

可能可以帮助他人。

答案 1 :(得分:0)

尝试此查询,它将解决您在问题中提到的两种情况。

SELECT 
   a.* 
FROM 
   tbl a
INNER JOIN 
   (select 
       @rn:=IF(@prevG = group_id AND @prevS = source_id, @rn + 1, 1) As rId,
       @prevG:=group_id AS group_id, 
       @prevS:=source_id AS source_id, 
       id, 
       sortsequence
    FROM 
       tbl 
    join 
       (select @rn:=0, @prevS:=0, @prevG:=0)b
    order by group_id, source_id, id) b
ON a.id = b.id AND a.SORTSEQUENCE <> b.RID;

<强> FIDDLE