我有一个包含订单/装运历史记录的表。基本的虚拟版本是:
ORDERS
order_no | order_stat | stat_date
2 | Planned | 01-Jan-2000
2 | Picked | 15-Jan-2000
2 | Planned | 17-Jan-2000
2 | Planned | 05-Feb-2000
2 | Planned | 31-Mar-2000
2 | Picked | 05-Apr-2000
2 | Shipped | 10-Apr-2000
我需要弄清楚每个订单在每个订单状态/阶段的时间。唯一的问题是,当我在order_no和order_stat上创建分区时,得到的结果有意义,但不是我想要的。
我的SQL:
select
order_no
,order_stat
,stat_date
,lag(stat_date, 1) over (partition by order_no order by stat_date) prev_stat_date
,stat_date - lag(stat_date, 1) over (partition by order_no order by stat_date) date_diff
,row_number() over(partition by order_no, order_stat order by stat_date) rnk
from
orders
会给我以下结果:
order_no | order_stat | stat_date | prev_stat_date | rnk
2 | Planned | 01-Jan-2000 | | 1
2 | Picked | 15-Jan-2000 | 01-Jan-2000 | 1
2 | Planned | 17-Jan-2000 | 15-Jan-2000 | 2
2 | Planned | 05-Feb-2000 | 17-Jan-2000 | 3
2 | Planned | 31-Mar-2000 | 05-Feb-2000 | 4
2 | Picked | 05-Apr-2000 | 31-Mar-2000 | 2
2 | Shipped | 10-Apr-2000 | 05-Apr-2000 | 1
我希望得到这样的结果(rnk恢复到先前的订单统计信息时会重新开始):
order_no | order_stat | stat_date | prev_stat_date | rnk
2 | Planned | 01-Jan-2000 | | 1
2 | Picked | 15-Jan-2000 | 01-Jan-2000 | 1
2 | Planned | 17-Jan-2000 | 15-Jan-2000 | 1
2 | Planned | 05-Feb-2000 | 17-Jan-2000 | 2
2 | Planned | 31-Mar-2000 | 05-Feb-2000 | 3
2 | Picked | 05-Apr-2000 | 31-Mar-2000 | 1
2 | Shipped | 10-Apr-2000 | 05-Apr-2000 | 1
我正在尝试获取状态已存在多长时间的运行总数(即使状态更改为先前存在,而不是包含在先前的分区中,也会重新开始计数),但我不知道如何解决这个问题。任何和所有见解将不胜感激。
答案 0 :(得分:1)
如果我理解正确,这是一个空白与孤岛的问题。
行号的不同可用于识别“岛屿”,然后枚举值:
select t.*,
row_number() over (partition by order_no, order_stat, seqnum - seqnum_2 order by stat_date) as your_rank
from (select o.*,
row_number() over (partition by order_no order by stat_date) as seqnum,
row_number() over (partition by order_no, order_stat order by stat_date) as seqnum_2
from orders o
) t;
我省略了其他列(例如lag()
),因此您可以看到逻辑。可能很难理解为什么这样做。如果您凝视子查询中的某些行,则可能会看到行号的不同如何定义所需的组。
答案 1 :(得分:0)
继续@Gordon的Tabibitosan方法,一旦有了分组,就可以得到每个组中的订单以及该组每个成员所经过的天数:
-- CTE for sample data
with orders (order_no, order_stat, stat_date) as (
select 2, 'Planned', date '2000-01-01' from dual
union all select 2, 'Picked', date '2000-01-15' from dual
union all select 2, 'Planned', date '2000-01-17' from dual
union all select 2, 'Planned', date '2000-02-05' from dual
union all select 2, 'Planned', date '2000-03-31' from dual
union all select 2, 'Picked ', date '2000-04-05' from dual
union all select 2, 'Shipped', date '2000-04-10' from dual
)
-- actual query
select order_no, order_stat, stat_date, grp,
dense_rank() over (partition by order_no, order_stat, grp order by stat_date) as rnk,
stat_date - min(stat_date) keep (dense_rank first order by stat_date)
over (partition by order_no, order_stat, grp) as stat_days
from (
select order_no, order_stat, stat_date,
row_number() over (partition by order_no order by stat_date)
- row_number() over (partition by order_no, order_stat order by stat_date) as grp
from orders
)
order by order_no, stat_date;
ORDER_NO ORDER_S STAT_DATE GRP RNK STAT_DAYS
---------- ------- ---------- ---------- ---------- ----------
2 Planned 2000-01-01 0 1 0
2 Picked 2000-01-15 1 1 0
2 Planned 2000-01-17 1 1 0
2 Planned 2000-02-05 1 2 19
2 Planned 2000-03-31 1 3 74
2 Picked 2000-04-05 5 1 0
2 Shipped 2000-04-10 6 1 0
内联视图本质上是Gordon所做的,只是它在该级别上微不足道地进行了相减。然后,外部查询以相同的方式获得排名,但还使用解析函数来获取该组的最早日期,并从当前行的日期中减去该日期。当然,您不必在最终结果中包括grp
或rnk
,因为它们可以显示更多信息。
不清楚您想要什么,但是您可以进一步扩展到例如:
with cte1 (order_no, order_stat, stat_date, grp) as (
select order_no, order_stat, stat_date,
row_number() over (partition by order_no order by stat_date)
- row_number() over (partition by order_no, order_stat order by stat_date)
from orders
),
cte2 (order_no, order_stat, stat_date, grp, grp_date, rnk) as (
select order_no, order_stat, stat_date, grp,
min(stat_date) keep (dense_rank first order by stat_date)
over (partition by order_no, order_stat, grp),
dense_rank() over (partition by order_no, order_stat, grp order by stat_date)
from cte1
)
select order_no, order_stat, stat_date, grp, grp_date, rnk,
stat_date - grp_date as stat_days_so_far,
case
when order_stat != 'Shipped' then
coalesce(first_value(stat_date)
over (partition by order_no order by grp_date
range between 1 following and unbounded following), trunc(sysdate))
- min(stat_date) keep (dense_rank first order by stat_date)
over (partition by order_no, order_stat, grp)
end as stat_days_total,
stat_date - min(stat_date) over (partition by order_no) as order_days_so_far,
case
when max(order_stat) keep (dense_rank last order by stat_date)
over (partition by order_no) = 'Shipped' then
max(stat_date) over (partition by order_no)
else
trunc(sysdate)
end
- min(stat_date) over (partition by order_no) as order_days_total
from cte2
order by order_no, stat_date;
为您的示例数据提供的信息:
ORDER_NO ORDER_S STAT_DATE GRP GRP_DATE RNK STAT_DAYS_SO_FAR STAT_DAYS_TOTAL ORDER_DAYS_SO_FAR ORDER_DAYS_TOTAL
---------- ------- ---------- ---------- ---------- ---------- ---------------- --------------- ----------------- ----------------
2 Planned 2000-01-01 0 2000-01-01 1 0 14 0 100
2 Picked 2000-01-15 1 2000-01-15 1 0 2 14 100
2 Planned 2000-01-17 1 2000-01-17 1 0 79 16 100
2 Planned 2000-02-05 1 2000-01-17 2 19 79 35 100
2 Planned 2000-03-31 1 2000-01-17 3 74 79 90 100
2 Picked 2000-04-05 5 2000-04-05 1 0 5 95 100
2 Shipped 2000-04-10 6 2000-04-10 1 0 100 100
我已经包含了一些逻辑,以假定“已发货”是最终状态,如果尚未达到,则最后一个状态仍在运行-直到今天。这可能是错误的,并且您可能还有其他最终状态值(例如已取消)。无论如何,有几件事可供您探索和玩...
您也许可以使用match_recognize
做类似的事情,但我将其留给其他人。