Question

我之前发布过这个问题，但我认为这个想法已经错过了。
这是我的新解释。

以下脚本应DBMS_OUTPUT，只有缺失的序列..但在我测试时，我试图删除一条记录。但是这个脚本仍然没有打印出这个序列号丢失的信息。

create or replace procedure show_missing_seqs(yy in varchar2 default '[0-9]{2}',
                                              mm in varchar2 default '[0-9]{2}',
                                              dd in varchar2 default '[0-9]{2}') as
  pattern varchar2(80);
  min_seq number(4):=1;
  max_seq number(4):=9999;
  cursor cur(pattern varchar2) is with 
  t as(
    select to_number(substr(filename, 5, 4)) as seq,
           substr(filename, 10, 2) as yy,
           substr(filename, 13, 2) as mm,
           substr(filename, 16, 2) as dd
      from test1@ra2013
     where regexp_like(filename, pattern)),
  r(yy, mm, dd, seq, max_seq) as(
    select yy, mm, dd, min_seq, max_seq
      from t
     group by yy, mm, dd
    union all
    select yy, mm, dd, seq + 1, max_seq
      from r
     where seq + 1 <= max_seq)
      select yy, mm, dd, seq as missing_seq
        from r
       where not exists (select 1
                from t
               where t.yy = r.yy
                 and t.mm = r.mm
                 and t.dd = r.dd
                 and t.seq = r.seq)
       order by yy, mm, dd, seq;


begin
  pattern := 'CDR[-][0-9]{4}[_][0-9]{2}' || yy || '[_][0-9]{2}' || mm ||
             '[_][0-9]{2}[_][0-9]{4}[_][N]["2"]';

  for rec in cur(pattern) loop
    dbms_output.put_line(rec.missing_seq);
  end loop;
  dbms_output.put_line('Done');

end show_missing_seqs;

Answer 1

如果您从SQLPLus运行PLSQL脚本，请确保您已启用serveroutput以实际查看其输出的内容。

sql＆gt;在

上设置serveroutput

Answer 2

你是说你缺少从510到4356的缺失序列？

通过完全运行代码......我可以看到输出。它打印所有缺失的序列，直到509.你没有看到509之后因为联合所有操作员。 Union all将前一个结果（表t）的最后一行传递到表r中，因此在509之前缺少seq。为了说明这一点，我明确地将表t数据称为union all的第二部分，并且还打印了max_seq。从结果中，您将看到前两行来自UNION ALL的上部，其余部分来自UNION ALL的第二部分，并在max_seq列处观察。

A   14  01  17  4356    4356    4356
A   14  02  07  397     509     509 
B   14  02  07  398     509     509 

with t as
( 
    SELECT
    to_number(substr(ename, 5,4 )) as seq,
    substr(ename, 10, 2) as yy,
    substr(ename, 13, 2) as mm,
    substr(ename, 16, 2) as dd
  from table1
  where regexp_like(ename, 'CDR[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][N]["2"]')
),
r (col, yy, mm, dd, seq, max_seq) as (
  SELECT 'A', yy, mm, dd, min(seq), max(seq)
  from t
  group by yy, mm, dd
  union all
  select 'B', yy, mm, dd,seq + 1, max_seq
  from r
  where seq + 1 <= (SELECT max(max_seq) FROM t)
)
select col, yy, mm, dd, seq, max_seq as missing_seq, max_seq
from r

不确定为什么你比较使用YY-MM-DD，但这里是我用过的sql如果我想模仿你在做什么 - 唯一不同的是t2，我确保最后一行应该是使用max-seq

with t as
    ( 
        SELECT
        to_number(substr(ename, 5,4 )) as seq,
        substr(ename, 10, 2) as yy,
        substr(ename, 13, 2) as mm,
        substr(ename, 16, 2) as dd
      from table1
      where regexp_like(ename, 'CDR[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][N]["2"]')
    ), t2 AS (
    select max(yy) yy, max(mm) mm, max(dd) dd, min(seq) seq, max(seq) max_seq from t ORDER BY 5 ASC 
    ),
    r (yy, mm, dd, seq, max_seq) as (
      select yy, mm, dd, (seq), (max_seq) from t2
      union all
      select yy, mm, dd,seq + 1, max_seq
      from r
      where seq + 1 <= max_seq
    )
    select yy, mm, dd, seq as missing_seq, max_seq
    from r
    where not exists (
      select 1 from t2 t
      where t.yy = r.yy
      and t.mm = r.mm
      and t.dd = r.dd
      and t.seq = r.seq
    )
    order by yy, mm, dd, seq

如果有帮助，请告诉我。

Answer 3

基于对一天内序列循环可能性的评论，查询变得越来越复杂，但我认为这就是你要找的东西：

with t as
(
  select to_number(substr(ename, 5,4 )) as seq,
    to_date(substr(ename, 10, 2) || substr(ename, 13, 2)
      || substr(ename, 16, 2) || substr(ename, 19, 4), 'RRMMDDHH24MI')
      as cdr_time
  from table1
  where regexp_like(ename,
    'CDR[-][0-9]{4}[_][0-9]{2}[_][0-9]{2}[_][0-9]{2}[_][0-9]{4}[_][N]["2"]')
  order by substr(ename, 10, 13)
),
u as (
  select cdr_date, cdr_time,
    case when lag(seq) over (order by cdr_time) is null then seq
        when lag(seq) over (order by cdr_time) > seq then 0
        when trunc(lag(cdr_time) over (order by cdr_time)) != trunc(cdr_time)
          then mod(lag(seq) over (order by cdr_time) + 1, 10000)
        end as min_seq,
    case when lead(seq) over (order by cdr_time) is null then seq
        when lead(seq) over (order by cdr_time) < seq then 9999
        when trunc(lead(cdr_time) over (order by cdr_time)) != trunc(cdr_time)
          then seq
      end as max_seq
  from t
),
v as (
  select cdr_time as min_cdr_time,
    lead(cdr_time) over (order by cdr_time) as max_cdr_time,
    min_seq,
    lead(max_seq) over (order by cdr_time) as max_seq
  from u
  where min_seq is not null or max_seq is not null
),
w (min_cdr_time, max_cdr_time, seq, max_seq) as (
  select min_cdr_time, max_cdr_time, min_seq, max_seq
  from v
  where min_seq is not null
  union all
  select min_cdr_time, max_cdr_time, seq + 1, max_seq
  from w
  where seq + 1 <= max_seq
)
select to_char(min_cdr_time, 'YYYY-MM-DD') as cdr_date,
  to_char(seq, 'FM0000') as missing_seq
from w
where not exists (
    select 1
    from t
    where t.cdr_time between w.min_cdr_time and w.max_cdr_time
    and t.seq = w.seq
  )
order by min_cdr_time, seq;

所有CTE再次;大多数可能是子查询，但w是递归的，所以可能更容易将它们全部保留下来。

CTE't'将ename转换为日期/时间值和序列值;这用于生成范围，以及稍后检查缺失值。

CTE u根据生成的t将cdr_time中的每条记录与上一张和下一张记录进行比较，以确定日期何时更改或序列已循环。这有点棘手，特别是当两者重合时，但是从一点点的测试中我认为这是有效的。

CTE v只是将u的结果压缩为每个日期/周期块的单个记录。（在下面的SQL小提琴中，这必须将两个日期值转换为日期，这没有意义;不确定这是否与该环境有关，我本身没有运行它的问题）。

然后，

CTE w生成每个数据/循环块中的所有序列值，与前一个问题的答案几乎相同。并且最终查询会查找这些范围内的缺失值，再次与之前相同。

SQL Fiddle对于一大块CDR，其中有一些被删除以制作应报告的缺失值。为了让生活更轻松，我每分钟创造了一条记录，但它会有更大的差距。我对您当前的小提琴中的数据进行了CPU资源限制，因此您可能在您的环境中运行它时遇到问题 - 取决于您认为有多少数据和多少空白。

如果你把它放在一个程序中并希望将输出限制在一天的间隙，你可能想要在u中添加限制，但每天都有一天;就在那一天，和也在v中。希望这会给你一些你可以适应的东西。

显示缺失序列过程将不起作用

3 个答案: