我想检索自上次更改特定列数据以来经过的天数,例如:
TABLE_X包含
ID PDATE DATA1 DATA2
A 10-Jan-2013 5 10
A 9-Jan-2013 5 10
A 8-Jan-2013 5 11
A 7-Jan-2013 5 11
A 6-Jan-2013 14 12
A 5-Jan-2013 14 12
B 10-Jan-2013 3 15
B 9-Jan-2013 3 15
B 8-Jan-2013 9 15
B 7-Jan-2013 9 15
B 6-Jan-2013 14 15
B 5-Jan-2013 14 8
我为了示例目的简化了表格。
结果应为:
ID DATA1_LASTUPDATE DATA2_LASTUPDATE
A 4 2
B 2 5
说, - A last1的数据1是4天前, - 最近更新的数据2是2天前, - B的最后更新数据1是2天前, - B最后更新的数据2是5天前。
使用下面的查询是可以的,但是如果我将它应用于具有大量记录的真实表并且添加另外2个数据列以查找其最新更新日,则需要很长时间才能完成。 我为此目的使用LEAD功能。 还有其他方法可以加快查询速度吗?
with qdata1 as
(
select ID, pdate from
(
select a.*, row_number() over (partition by ID order by pdate desc) rnum from
(
select a.*,
lead(data1,1,0) over (partition by ID order by pdate desc) - data1 as data1_diff
from table_x a
) a
where data1_diff <> 0
)
where rnum=1
),
qdata2 as
(
select ID, pdate from
(
select a.*, row_number() over (partition by ID order by pdate desc) rnum from
(
select a.*,
lead(data2,1,0) over (partition by ID order by pdate desc) - data2 as data2_diff
from table_x a
) a
where data2_diff <> 0
)
where rnum=1
)
select a.ID,
trunc(sysdate) - b.pdate data1_lastupdate,
trunc(sysdate) - c.pdate data2_lastupdate,
from table_master a, qdata1 b, qdata2 c
where a.ID=b.ID(+) and a.ID=b.ID(+)
and a.ID=c.ID(+) and a.ID=c.ID(+)
非常感谢。
答案 0 :(得分:0)
你的查询没有为我返回正确的结果,也许我错过了一些东西,但我也得到了正确的结果以下查询(你可以查看SQLFiddle demo):
with ranked as (
select ID,
data1,
data2,
rank() over(partition by id order by pdate desc) r
from table_x
)
select id,
sum(DATA1_LASTUPDATE) DATA1_LASTUPDATE,
sum(DATA2_LASTUPDATE) DATA2_LASTUPDATE
from (
-- here I get when data1 was updated
select id,
count(1) DATA1_LASTUPDATE,
0 DATA2_LASTUPDATE
from ranked
start with r = 1
CONNECT BY (PRIOR data1 = data1)
and PRIOR r = r - 1
group by id
union
-- here I get when data2 was updated
select id,
0 DATA1_LASTUPDATE,
count(1) DATA0_LASTUPDATE
from ranked
start with r = 1
CONNECT BY (PRIOR data2 = data2)
and PRIOR r = r - 1
group by id
)
group by id
答案 1 :(得分:0)
您可以通过同时执行滞后(或引导)计算来避免表和连接上的多次点击:
with t as (
select id, pdate, data1, data2,
lag(data1) over (partition by id order by pdate) as lag_data1,
lag(data2) over (partition by id order by pdate) as lag_data2
from table_x
),
u as (
select t.*,
case when lag_data1 is null or lag_data1 != data1 then pdate end as pdate1,
case when lag_data2 is null or lag_data2 != data2 then pdate end as pdate2
from t
),
v as (
select u.*,
rank() over (partition by id order by pdate1 desc nulls last) as rn1,
rank() over (partition by id order by pdate2 desc nulls last) as rn2
from u
)
select v.id,
max(trunc(sysdate) - (case when rn1 = 1 then pdate1 end))
as data1_last_update,
max(trunc(sysdate) - (case when rn2 = 1 then pdate2 end))
as data2_last_update
from v
group by v.id
order by v.id;
我假设您认为您的数据适用于Jun-2014
,而不是Jan-2013
;并且您将最近的更改日期与当前日期进行比较。将数据调整为使用10-Jun-2014
等,这会给出:
ID DATA1_LAST_UPDATE DATA2_LAST_UPDATE
-- ----------------- -----------------
A 4 2
B 2 5
第一个CTE(t
)获取实际的表数据,并使用滞后(每个数据列一个,添加两个额外的列,使用滞后(与降序日期排序的铅相同)。
第二个CTE(u
)添加两个日期列,这些日期列仅在数据列发生更改时设置(或者在首次设置时设置,以防它们从未更改过)。因此,如果某行的data1
与上一行相同,则其pdate1
将为空。您可以通过重复延迟计算来组合前两个,但我已将其拆分以使其更清晰。
第三个CTE(v
)为这些pdate
列分配排名,以便将最新列排在第一位。
最终查询解决了从当前日期到每个数据列的最高排名(即最近)变化的差异。
SQL Fiddle,包括所有单独运行的CTE,以便您可以看到他们正在做什么。