Oracle:选择缺少日期

时间:2012-03-06 22:28:03

标签: sql oracle gaps-and-islands

我在一个字段中有一个表(其中包括)日期。

我需要获取所有日期的列表,这些日期比最早的日期更新,比最近的日期更早,并且在表格中完全丢失。

所以,如果表包含:

2012-01-02
2012-01-02
2012-01-03
2012-01-05
2012-01-05
2012-01-07
2012-01-08

我想要一个返回的查询:

2012-01-04
2012-01-06

4 个答案:

答案 0 :(得分:12)

这样的事情(假设您的表名为your_table且日期列名为the_date):

with date_range as (
      select min(the_date) as oldest, 
             max(the_date) as recent, 
             max(the_date) - min(the_date) as total_days
      from your_table
),
all_dates as (
   select oldest + level - 1 as a_date
   from date_range
   connect by level <= (select total_days from date_range)
)
select ad.a_date
from all_dates ad
  left join your_table yt on ad.a_date = yt.the_date
where yt.the_date is null
order by ad.a_date;  

编辑:
WITH子句称为“公用表表达式”,等同于派生表(“内联视图”)。

类似
select * 
from ( 
     ..... 
) all_dates
join your_table ...

第二个CTE使用Oracle connect by实现的未记录功能简单地“动态”创建日期列表。

重复使用select(就像我计算第一个和最后一个日期一样)比使用派生表更容易(和IMHO更易读)。

编辑2:

这也可以通过递归CTE完成:

with date_range as (
      select min(the_date) as oldest, 
             max(the_date) as recent, 
             max(the_date) - min(the_date) as total_days
      from your_table
),
all_dates (a_date, lvl) as (
   select oldest as a_date, 1 as lvl
   from date_range 
   union all
   select (select oldest from date_range) + lvl, lvl + 1
   from all_dates 
   where lvl < (select total_days from date_range)
)
select ad.a_date, lvl
from all_dates ad    
  left join your_table yt on ad.a_date = yt.the_date
where yt.the_date is null
order by ad.a_date;  

哪个DBMS应该适用于支持递归CTE(PostgreSQL和Firebird - 更符合标准 - 尽管需要recursive关键字)。

请注意递归部分中的hack select (select oldest from date_range) + lvl, lvl + 1。这不应该是必要的,但Oracle在递归CTE中仍有一些关于DATE的错误。在PostgreSQL中,以下工作没有问题:

....
all_dates (a_date, lvl) as (
   select oldest as a_date, 0 as lvl
   from date_range 
   union all
   select a_date + 1, lvl + 1
   from all_dates 
   where lvl < (select total_days from date_range)
)
....

答案 1 :(得分:1)

我会选择这种变体,因为它效率更高:

with all_dates_wo_boundary_values as
( select oldest + level the_date
    from ( select min(the_date) oldest
                , max(the_date) recent
             from your_table
         )
 connect by level <= recent - oldest - 1
)
select the_date
  from all_dates_wo_boundary_values
 minus
select the_date
  from your_table

这是一些证据 首先是设置:

SQL> create table your_table (the_date)
  2  as
  3  select date '2012-01-02' from dual union all
  4  select date '2012-01-02' from dual union all
  5  select date '2012-01-03' from dual union all
  6  select date '2012-01-05' from dual union all
  7  select date '2012-01-05' from dual union all
  8  select date '2012-01-07' from dual union all
  9  select date '2012-01-08' from dual
 10  /

Table created.

SQL> exec dbms_stats.gather_table_stats(user,'your_table')

PL/SQL procedure successfully completed.

SQL> alter session set statistics_level = all
  2  /

Session altered.

马的查询:

SQL> with date_range as
  2  ( select min(the_date) as oldest
  3         , max(the_date) as recent
  4         , max(the_date) - min(the_date) as total_days
  5      from your_table
  6  )
  7  , all_dates as
  8  ( select ( select oldest from date_range) + level as a_date
  9      from dual
 10   connect by level <= (select total_days from date_range)
 11  )
 12  select ad.a_date
 13    from all_dates ad
 14         left join your_table yt on ad.a_date = yt.the_date
 15   where yt.the_date is null
 16   order by ad.a_date
 17  /

A_DATE
-------------------
04-01-2012 00:00:00
06-01-2012 00:00:00

2 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  gaqx49vb9gz9k, child number 0
-------------------------------------
with date_range as ( select min(the_date) as oldest        , max(the_date) as recent        , max(the_date) - min(the_date) as total_d
ays     from your_table )

, all_dates as ( select ( select oldest from date_range) + level as a_date     from dual  connect by level <= (select total_days from
date_range) ) select

ad.a_date   from all_dates ad        left join your_table yt on ad.a_date = yt.the_date  where yt.the_date is null  order by ad.a_date

Plan hash value: 1419150012

------------------------------------------------------------------------------------------------------------------------------------------------------------------------    
| Id  | Operation                         | Name                        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   1 |  TEMP TABLE TRANSFORMATION        |                             |      1 |        |      2 |00:00:00.01 |      22 |      1 |    1 |       |       |          |
|   2 |   LOAD AS SELECT                  |                             |      1 |        |      1 |00:00:00.01 |       7 |      0 |    1 |   262K|   262K|  262K (0)|
|   3 |    SORT AGGREGATE                 |                             |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|   4 |     TABLE ACCESS FULL             | YOUR_TABLE                  |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|   5 |   SORT ORDER BY                   |                             |      1 |      1 |      2 |00:00:00.01 |      12 |      1 |    0 |  2048 |  2048 | 2048  (0)|
|*  6 |    FILTER                         |                             |      1 |        |      2 |00:00:00.01 |      12 |      1 |    0 |       |       |          |
|*  7 |     HASH JOIN OUTER               |                             |      1 |      1 |      7 |00:00:00.01 |      12 |      1 |    0 |  1048K|  1048K|  707K (0)|
|   8 |      VIEW                         |                             |      1 |      1 |      6 |00:00:00.01 |       9 |      1 |    0 |       |       |          |
|   9 |       CONNECT BY WITHOUT FILTERING|                             |      1 |        |      6 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|  10 |        FAST DUAL                  |                             |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |    0 |       |       |          |
|  11 |        VIEW                       |                             |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|  12 |         TABLE ACCESS FULL         | SYS_TEMP_0FD9D660C_81240964 |      1 |      1 |      1 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
|  13 |      TABLE ACCESS FULL            | YOUR_TABLE                  |      1 |      7 |      7 |00:00:00.01 |       3 |      0 |    0 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter("YT"."THE_DATE" IS NULL)
   7 - access("YT"."THE_DATE"=INTERNAL_FUNCTION("AD"."A_DATE"))


32 rows selected.

我的建议:

SQL> with all_dates_wo_boundary_values as
  2  ( select oldest + level the_date
  3      from ( select min(the_date) oldest
  4                  , max(the_date) recent
  5               from your_table
  6           )
  7   connect by level <= recent - oldest - 1
  8  )
  9  select the_date
 10    from all_dates_wo_boundary_values
 11   minus
 12  select the_date
 13    from your_table
 14  /

THE_DATE
-------------------
04-01-2012 00:00:00
06-01-2012 00:00:00

2 rows selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
  2  /

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7aavxmzkj7zq7, child number 0
-------------------------------------
with all_dates_wo_boundary_values as ( select oldest + level the_date     from ( select min(the_date) oldest
  , max(the_date) recent              from your_table          )  connect by level <= recent - oldest - 1 ) select
the_date   from all_dates_wo_boundary_values  minus select the_date   from your_table

Plan hash value: 2293301832

-----------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------
|   1 |  MINUS                          |            |      1 |        |      2 |00:00:00.01 |       6 |       |       |          |
|   2 |   SORT UNIQUE                   |            |      1 |      1 |      5 |00:00:00.01 |       3 |  9216 |  9216 | 8192  (0)|
|   3 |    VIEW                         |            |      1 |      1 |      5 |00:00:00.01 |       3 |       |       |          |
|   4 |     CONNECT BY WITHOUT FILTERING|            |      1 |        |      5 |00:00:00.01 |       3 |       |       |          |
|   5 |      VIEW                       |            |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|   6 |       SORT AGGREGATE            |            |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
|   7 |        TABLE ACCESS FULL        | YOUR_TABLE |      1 |      7 |      7 |00:00:00.01 |       3 |       |       |          |
|   8 |   SORT UNIQUE                   |            |      1 |      7 |      5 |00:00:00.01 |       3 |  9216 |  9216 | 8192  (0)|
|   9 |    TABLE ACCESS FULL            | YOUR_TABLE |      1 |      7 |      7 |00:00:00.01 |       3 |       |       |          |
-----------------------------------------------------------------------------------------------------------------------------------


22 rows selected.

的问候,
罗布。

答案 2 :(得分:1)

我们可以使用简单的分层查询,如下所示:

WITH CTE AS
(SELECT (SELECT MIN(COL1) FROM T)+LEVEL-1 AS OUT FROM DUAL
CONNECT BY (LEVEL-1) <= (SELECT MAX(COL1) - MIN(COL1) FROM T))
SELECT OUT FROM CTE WHERE OUT NOT IN (SELECT COL1 FROM T);

答案 3 :(得分:0)

您需要一个Calendar表(永久或动态创建)。然后你可以做一个简单的事情:

SELECT c.my_date
FROM 
        calendar c
    JOIN
        ( SELECT MIN(date_column) AS min_date 
               , MAX(date_column) AS max_date 
          FROM tableX
        ) mm
      ON c.mydate BETWEEN min_date AND max_date
WHERE
    c.my_date NOT IN
    ( SELECT date_column
      FROM tableX
    )