查询以在物理上分开时保持分区分开

时间:2019-05-30 16:58:49

标签: sql database oracle oracle12c

我有一个包含订单/装运历史记录的表。基本的虚拟版本是:

ORDERS
order_no | order_stat | stat_date 
 2       | Planned    |  01-Jan-2000
 2       | Picked     |  15-Jan-2000
 2       | Planned    |  17-Jan-2000
 2       | Planned    |  05-Feb-2000
 2       | Planned    |  31-Mar-2000
 2       | Picked     |  05-Apr-2000
 2       | Shipped    |  10-Apr-2000

我需要弄清楚每个订单在每个订单状态/阶段的时间。唯一的问题是,当我在order_no和order_stat上创建分区时,得到的结果有意义,但不是我想要的。

我的SQL:

 select
    order_no
    ,order_stat
    ,stat_date
    ,lag(stat_date, 1) over (partition by order_no order by stat_date) prev_stat_date
    ,stat_date - lag(stat_date, 1) over (partition by order_no order by stat_date) date_diff
    ,row_number() over(partition by order_no, order_stat order by stat_date) rnk
 from
    orders

会给我以下结果:

order_no | order_stat | stat_date     | prev_stat_date  |    rnk     
 2       | Planned    |  01-Jan-2000  |                 |  1
 2       | Picked     |  15-Jan-2000  |  01-Jan-2000    |  1
 2       | Planned    |  17-Jan-2000  |  15-Jan-2000    |  2
 2       | Planned    |  05-Feb-2000  |  17-Jan-2000    |  3
 2       | Planned    |  31-Mar-2000  |  05-Feb-2000    |  4
 2       | Picked     |  05-Apr-2000  |  31-Mar-2000    |  2
 2       | Shipped    |  10-Apr-2000  |  05-Apr-2000    |  1  

我希望得到这样的结果(rnk恢复到先前的订单统计信息时会重新开始):

order_no | order_stat | stat_date     | prev_stat_date  |    rnk     
 2       | Planned    |  01-Jan-2000  |                 |  1
 2       | Picked     |  15-Jan-2000  |  01-Jan-2000    |  1
 2       | Planned    |  17-Jan-2000  |  15-Jan-2000    |  1
 2       | Planned    |  05-Feb-2000  |  17-Jan-2000    |  2
 2       | Planned    |  31-Mar-2000  |  05-Feb-2000    |  3
 2       | Picked     |  05-Apr-2000  |  31-Mar-2000    |  1
 2       | Shipped    |  10-Apr-2000  |  05-Apr-2000    |  1

我正在尝试获取状态已存在多长时间的运行总数(即使状态更改为先前存在,而不是包含在先前的分区中,也会重新开始计数),但我不知道如何解决这个问题。任何和所有见解将不胜感激。

2 个答案:

答案 0 :(得分:1)

如果我理解正确,这是一个空白与孤岛的问题。

行号的不同可用于识别“岛屿”,然后枚举值:

select t.*,
       row_number() over (partition by order_no, order_stat, seqnum - seqnum_2 order by stat_date) as your_rank
from (select o.*,
             row_number() over (partition by order_no order by stat_date) as seqnum,
             row_number() over (partition by order_no, order_stat order by stat_date) as seqnum_2
      from orders o
     ) t;

我省略了其他列(例如lag()),因此您可以看到逻辑。可能很难理解为什么这样做。如果您凝视子查询中的某些行,则可能会看到行号的不同如何定义所需的组。

答案 1 :(得分:0)

继续@Gordon的Tabibitosan方法,一旦有了分组,就可以得到每个组中的订单以及该组每个成员所经过的天数:

-- CTE for sample data
with orders (order_no, order_stat, stat_date) as (
            select 2, 'Planned', date '2000-01-01' from dual
  union all select 2, 'Picked',  date '2000-01-15' from dual
  union all select 2, 'Planned', date '2000-01-17' from dual
  union all select 2, 'Planned', date '2000-02-05' from dual
  union all select 2, 'Planned', date '2000-03-31' from dual
  union all select 2, 'Picked ', date '2000-04-05' from dual
  union all select 2, 'Shipped', date '2000-04-10' from dual
)
-- actual query
select order_no, order_stat, stat_date, grp,
  dense_rank() over (partition by order_no, order_stat, grp order by stat_date) as rnk,
  stat_date - min(stat_date) keep (dense_rank first order by stat_date)
                over (partition by order_no, order_stat, grp) as stat_days
from (
  select order_no, order_stat, stat_date,
    row_number() over (partition by order_no order by stat_date)
      - row_number() over (partition by order_no, order_stat order by stat_date) as grp
  from orders
)
order by order_no, stat_date;

  ORDER_NO ORDER_S STAT_DATE         GRP        RNK  STAT_DAYS
---------- ------- ---------- ---------- ---------- ----------
         2 Planned 2000-01-01          0          1          0
         2 Picked  2000-01-15          1          1          0
         2 Planned 2000-01-17          1          1          0
         2 Planned 2000-02-05          1          2         19
         2 Planned 2000-03-31          1          3         74
         2 Picked  2000-04-05          5          1          0
         2 Shipped 2000-04-10          6          1          0

内联视图本质上是Gordon所做的,只是它在该级别上微不足道地进行了相减。然后,外部查询以相同的方式获得排名,但还使用解析函数来获取该组的最早日期,并从当前行的日期中减去该日期。当然,您不必在最终结果中包括grprnk,因为它们可以显示更多信息。

不清楚您想要什么,但是您可以进一步扩展到例如:

with cte1 (order_no, order_stat, stat_date, grp) as (
  select order_no, order_stat, stat_date,
    row_number() over (partition by order_no order by stat_date)
      - row_number() over (partition by order_no, order_stat order by stat_date)
  from orders
),
cte2 (order_no, order_stat, stat_date, grp, grp_date, rnk) as (
  select order_no, order_stat, stat_date, grp,
    min(stat_date) keep (dense_rank first order by stat_date)
      over (partition by order_no, order_stat, grp),
    dense_rank() over (partition by order_no, order_stat, grp order by stat_date)
  from cte1
)
select order_no, order_stat, stat_date, grp, grp_date, rnk,
  stat_date - grp_date as stat_days_so_far,
  case
    when order_stat != 'Shipped' then
      coalesce(first_value(stat_date)
                 over (partition by order_no order by grp_date
                   range between 1 following and unbounded following), trunc(sysdate))
        - min(stat_date) keep (dense_rank first order by stat_date)
            over (partition by order_no, order_stat, grp)
  end as stat_days_total,
  stat_date - min(stat_date) over (partition by order_no) as order_days_so_far,
  case
    when max(order_stat) keep (dense_rank last order by stat_date)
           over (partition by order_no) = 'Shipped' then
      max(stat_date) over (partition by order_no)
    else
      trunc(sysdate)
  end
    - min(stat_date) over (partition by order_no) as order_days_total
from cte2
order by order_no, stat_date;

为您的示例数据提供的信息:

  ORDER_NO ORDER_S STAT_DATE         GRP GRP_DATE          RNK STAT_DAYS_SO_FAR STAT_DAYS_TOTAL ORDER_DAYS_SO_FAR ORDER_DAYS_TOTAL
---------- ------- ---------- ---------- ---------- ---------- ---------------- --------------- ----------------- ----------------
         2 Planned 2000-01-01          0 2000-01-01          1                0              14                 0              100
         2 Picked  2000-01-15          1 2000-01-15          1                0               2                14              100
         2 Planned 2000-01-17          1 2000-01-17          1                0              79                16              100
         2 Planned 2000-02-05          1 2000-01-17          2               19              79                35              100
         2 Planned 2000-03-31          1 2000-01-17          3               74              79                90              100
         2 Picked  2000-04-05          5 2000-04-05          1                0               5                95              100
         2 Shipped 2000-04-10          6 2000-04-10          1                0                               100              100

我已经包含了一些逻辑,以假定“已发货”是最终状态,如果尚未达到,则最后一个状态仍在运行-直到今天。这可能是错误的,并且您可能还有其他最终状态值(例如已取消)。无论如何,有几件事可供您探索和玩...

您也许可以使用match_recognize做类似的事情,但我将其留给其他人。