SQL - 在大多数有序序列中查找缺少的int值

时间:2011-12-03 00:05:03

标签: sql oracle gaps-and-islands

我管理一个基于消息的系统,其中一系列独特的整数id将在一天结束时完全表示,但它们不一定按顺序到达。

我正在寻找帮助,使用SQL在本系列中找到缺失的ID。如果我的列值如下所示,我如何找到此序列中缺少的ID,在本例中为6

序列将在每天的任意点开始和结束,因此每次运行时min和max会有所不同。来自Perl背景我通过那里的一些正则表达式。

ids
1
2
3
5
4
7
9
8
10

非常感谢帮助。

编辑:我们运行oracle

Edit2:谢谢大家。我将在下周在办公室内完成你的解决方案。

Edit3:我暂时解决了类似下面的事情,ORIG_ID是原始id列,MY_TABLE是源表。仔细观察我的数据,除了字符串中的数字数据外,还有很多种情况。在某些情况下,有一个非数字字符的前缀或后缀。在其他情况下,在数字id中混合了破折号或空格。除此之外,ID会定期出现多次,因此我包含了不同的内容。

我很感激任何进一步的输入,特别是关于剥离非数字字符的最佳途径。

SELECT 
   CASE
      WHEN NUMERIC_ID + 1 = NEXT_ID - 1
         THEN TO_CHAR( NUMERIC_ID + 1 )
      ELSE TO_CHAR( NUMERIC_ID + 1 ) || '-' || TO_CHAR( NEXT_ID - 1 )
   END
   MISSING_SEQUENCES
   FROM
   (
      SELECT
         NUMERIC_ID,
         LEAD (NUMERIC_ID, 1, NULL)
            OVER 
            (
               ORDER BY
                 NUMERIC_ID
                 ASC
            )
            AS NEXT_ID
         FROM 
         (
             SELECT
                DISTINCT TO_NUMBER( REGEXP_REPLACE(ORIG_ID,'[^[:digit:]]','') ) 
                AS NUMERIC_ID
             FROM MY_TABLE
         )
    ) WHERE NEXT_ID != NUMERIC_ID + 1

6 个答案:

答案 0 :(得分:5)

我去过那里。

<强> FOR ORACLE:

前一段时间我在网上发现了这个非常有用的查询并记下来了,但我现在不记得该网站了,你可以搜索 "GAP ANALYSIS" 在Google上。

SELECT   CASE
             WHEN ids + 1 = lead_no - 1 THEN TO_CHAR (ids +1)
          ELSE TO_CHAR (ids + 1) || '-' || TO_CHAR (lead_no - 1)
         END
             Missing_track_no
   FROM   (SELECT   ids,
                    LEAD (ids, 1, NULL)
                     OVER (ORDER BY ids ASC)
                        lead_no
             FROM   YOURTABLE
             )
   WHERE   lead_no != ids + 1

此处的结果为:

MISSING _TRACK_NO
-----------------
       6

如果存在多个差距,比如说2,6,7,9那么它将是:

MISSING _TRACK_NO
-----------------
        2
       6-7
        9

答案 1 :(得分:4)

这有时称为排除连接。也就是说,尝试进行连接并仅返回没有匹配的行。

SELECT t1.value-1
FROM ThisTable AS t1
LEFT OUTER JOIN ThisTable AS t2
  ON t1.id = t2.value+1
WHERE t2.value IS NULL

请注意,这将至少报告一行,即MIN value

此外,如果有两个或更多数字的间隙,它只会报告一个缺失值。

答案 2 :(得分:1)

你没有陈述你的DBMS,所以我假设P​​ostgreSQL:

select aid as missing_id
from generate_series( (select min(id) from message), (select max(id) from message)) as aid
  left join message m on m.id = aid
where m.id is null;  

这将报告表格中最小和最大ID之间序列中的任何缺失值(包括大于1的差距)

psql (9.1.1)
Type "help" for help.

postgres=> select * from message;
 id
----
  1
  2
  3
  4
  5
  7
  8
  9
 11
 14
(10 rows)


postgres=> select aid as missing_id
postgres-> from generate_series( (select min(id) from message), (select max(id) from message)) as aid
postgres->   left join message m on m.id = aid
postgres-> where m.id is null;
 missing_id
------------
          6
         10
         12
         13
(4 rows)
postgres=>

答案 3 :(得分:0)

我在mysql中应用它,它工作..

mysql> select * from sequence;
+--------+
| number |
+--------+
|      1 |
|      2 |
|      4 |
|      6 |
|      7 |
|      8 |
+--------+
6 rows in set (0.00 sec)

mysql> SELECT t1.number - 1 FROM sequence AS t1 LEFT OUTER JOIN sequence AS t2 O
N t1.number = t2.number +1 WHERE t2.number IS NULL;
+---------------+
| t1.number - 1 |
+---------------+
|             0 |
|             3 |
|             5 |
+---------------+
3 rows in set (0.00 sec)

答案 4 :(得分:0)

SET search_path='tmp';

DROP table tmp.table_name CASCADE;
CREATE table tmp.table_name ( num INTEGER NOT NULL PRIMARY KEY);
-- make some data
INSERT INTO tmp.table_name(num) SELECT generate_series(1,20);
-- create some gaps
DELETE FROM tmp.table_name WHERE random() < 0.3 ;

SELECT * FROM table_name;

-- EXPLAIN ANALYZE
WITH zbot AS (
    SELECT 1+tn.num  AS num
    FROM table_name tn
    WHERE NOT EXISTS (
        SELECT * FROM table_name nx
        WHERE nx.num = tn.num+1
        )
    )
, ztop AS (
    SELECT -1+tn.num  AS num
    FROM table_name tn
    WHERE NOT EXISTS (
        SELECT * FROM table_name nx
        WHERE nx.num = tn.num-1
        )
    )
SELECT zbot.num AS bot
    ,ztop.num AS top
FROM zbot, ztop
WHERE zbot.num <= ztop.num
AND NOT EXISTS ( SELECT *
    FROM table_name nx
    WHERE nx.num >= zbot.num
    AND nx.num <= ztop.num
    )
ORDER BY bot,top
    ;

结果:

CREATE TABLE
INSERT 0 20
DELETE 9
 num 
-----
   1
   2
   6
   7
  10
  11
  13
  14
  15
  18
  19
(11 rows)

 bot | top 
-----+-----
   3 |   5
   8 |   9
  12 |  12
  16 |  17
(4 rows)

注意:递归CTE也是可能的(可能更短)。

更新:这是递归CTE ......:

WITH RECURSIVE tree AS (
    SELECT 1+num AS num
    FROM table_name t0
    UNION
    SELECT 1+num FROM tree tt
    WHERE EXISTS ( SELECT *
        FROM table_name xt
        WHERE xt.num > tt.num
        )
    )
SELECT * FROM tree
WHERE NOT EXISTS (
    SELECT *
    FROM table_name nx
    WHERE nx.num = tree.num
    )
ORDER BY num
    ;

结果:(相同数据)

 num 
-----
   3
   4
   5
   8
   9
  12
  16
  17
  20
 (9 rows)

答案 5 :(得分:0)

select student_key, next_student_key
      from (
    select student_key, lead(student_key) over (order by student_key) next_fed_cls_prgrm_key
      from student_table
           )
where student_key <> next_student_key-1;