我有一个road_events
表:
create table road_events (
event_id number(4,0),
road_id number(4,0),
year number(4,0),
from_meas number(10,2),
to_meas number(10,2),
total_road_length number(10,2)
);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (1,1,2020,25,50,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (2,1,2000,25,50,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (3,1,1980,0,25,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (4,1,1960,75,100,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (5,1,1940,1,100,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (6,2,2000,10,30,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (7,2,1975,30,60,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (8,2,1950,50,90,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (9,3,2050,40,90,100);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (10,4,2040,0,200,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (11,4,2013,0,199,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (12,4,2001,0,200,200);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (13,5,1985,50,70,300);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (14,5,1985,10,50,300);
insert into road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) values (15,5,1965,1,301,300);
commit;
select * from road_events;
EVENT_ID ROAD_ID YEAR FROM_MEAS TO_MEAS TOTAL_ROAD_LENGTH
---------- ---------- ---------- ---------- ---------- -----------------
1 1 2020 25 50 100
2 1 2000 25 50 100
3 1 1980 0 25 100
4 1 1960 75 100 100
5 1 1940 1 100 100
6 2 2000 10 30 100
7 2 1975 30 60 100
8 2 1950 50 90 100
9 3 2050 40 90 100
10 4 2040 0 200 200
11 4 2013 0 199 200
12 4 2001 0 200 200
13 5 1985 50 70 300
14 5 1985 10 50 300
15 5 1965 1 301 300
我想选择代表每条道路上最新作品的事件。
这是一项棘手的操作,因为事件通常只涉及道路的一部分。这意味着我不能简单地选择每条道路的最新事件。我只需要选择不重叠的最新 事件里程 。
可能的逻辑(顺序):
我不愿意猜测如何解决此问题,因为它最终可能会造成更大的伤害(类似于XY Problem)。另一方面,它可以提供对问题本质的洞察力,所以就去了:
event A
。event A
是>= total_road_length
,那就是我所需要的。该算法到此结束。event B
),其范围与event A
不同。 event B
的范围与event A
的范围重叠,则仅获得event B
的不重叠的部分。 = total_road_length
。或者,当那条路没有更多活动时停止。问题:
我知道这是一个艰巨的任务,但是需要多少钱?
这是一个经典的线性引用问题。如果我可以将线性引用操作作为查询的一部分,那将非常有帮助。
结果将是:
EVENT_ID ROAD_ID YEAR TOTAL_ROAD_LENGTH EVENT_LENGTH
---------- ---------- ---------- ----------------- ------------
1 1 2020 100 25
3 1 1980 100 25
4 1 1960 100 25
5 1 1940 100 25
6 2 2000 100 20
7 2 1975 100 30
8 2 1950 100 30
9 3 2050 100 50
10 4 2040 200 200
13 5 1985 300 20
14 5 1985 300 40
15 5 1965 300 240
答案 0 :(得分:5)
我的主要DBMS是Teradata,但这在Oracle中也可以正常使用。
WITH all_meas AS
( -- get a distinct list of all from/to points
SELECT road_id, from_meas AS meas
FROM road_events
UNION
SELECT road_id, to_meas
FROM road_events
)
-- select * from all_meas order by 1,2
, all_ranges AS
( -- create from/to ranges
SELECT road_id, meas AS from_meas
,Lead(meas)
Over (PARTITION BY road_id
ORDER BY meas) AS to_meas
FROM all_meas
)
-- SELECT * from all_ranges order by 1,2
, all_event_ranges AS
( -- now match the ranges to the event ranges
SELECT
ar.*
,re.event_id
,re.year
,re.total_road_length
,ar.to_meas - ar.from_meas AS event_length
-- used to filter the latest event as multiple events might cover the same range
,Row_Number()
Over (PARTITION BY ar.road_id, ar.from_meas
ORDER BY year DESC) AS rn
FROM all_ranges ar
JOIN road_events re
ON ar.road_id = re.road_id
AND ar.from_meas < re.to_meas
AND ar.to_meas > re.from_meas
WHERE ar.to_meas IS NOT NULL
)
SELECT event_id, road_id, year, total_road_length, Sum(event_length)
FROM all_event_ranges
WHERE rn = 1 -- latest year only
GROUP BY event_id, road_id, year, total_road_length
ORDER BY road_id, year DESC;
如果您需要返回实际涵盖的from/to_meas
(如在编辑之前的问题中所述),则可能会更复杂。第一部分是相同的,但是在不进行汇总的情况下,查询可以返回具有相同event_id的相邻行(例如,对于事件3:0-1和1-25):
SELECT * FROM all_event_ranges
WHERE rn = 1
ORDER BY road_id, from_meas;
如果要合并相邻的行,还需要两个步骤(使用标准方法,标记组的第一行并计算组号):
WITH all_meas AS
(
SELECT road_id, from_meas AS meas
FROM road_events
UNION
SELECT road_id, to_meas
FROM road_events
)
-- select * from all_meas order by 1,2
, all_ranges AS
(
SELECT road_id, meas AS from_meas
,Lead(meas)
Over (PARTITION BY road_id
ORDER BY meas) AS to_meas
FROM all_meas
)
-- SELECT * from all_ranges order by 1,2
, all_event_ranges AS
(
SELECT
ar.*
,re.event_id
,re.year
,re.total_road_length
,ar.to_meas - ar.from_meas AS event_length
,Row_Number()
Over (PARTITION BY ar.road_id, ar.from_meas
ORDER BY year DESC) AS rn
FROM all_ranges ar
JOIN road_events re
ON ar.road_id = re.road_id
AND ar.from_meas < re.to_meas
AND ar.to_meas > re.from_meas
WHERE ar.to_meas IS NOT NULL
)
-- SELECT * FROM all_event_ranges WHERE rn = 1 ORDER BY road_id, from_meas
, adjacent_events AS
( -- assign 1 to the 1st row of an event
SELECT t.*
,CASE WHEN Lag(event_id)
Over(PARTITION BY road_id
ORDER BY from_meas) = event_id
THEN 0
ELSE 1
END AS flag
FROM all_event_ranges t
WHERE rn = 1
)
-- SELECT * FROM adjacent_events ORDER BY road_id, from_meas
, grouped_events AS
( -- assign a groupnumber to adjacent rows using a Cumulative Sum over 0/1
SELECT t.*
,Sum(flag)
Over (PARTITION BY road_id
ORDER BY from_meas
ROWS Unbounded Preceding) AS grp
FROM adjacent_events t
)
-- SELECT * FROM grouped_events ORDER BY road_id, from_meas
SELECT event_id, road_id, year, Min(from_meas), Max(to_meas), total_road_length, Sum(event_length)
FROM grouped_events
GROUP BY event_id, road_id, grp, year, total_road_length
ORDER BY 2, Min(from_meas);
编辑:
ups,我刚刚发现一个博客Overlapping ranges with priority使用一些简化的Oracle语法完全相同。实际上,我将查询从Teradata中的其他简化语法转换为Standard / Oracle SQL:-)
答案 1 :(得分:5)
还有另一种计算方法,使用from和to值:
with
part_begin_point as (
Select distinct road_id, from_meas point
from road_events be
union
Select distinct road_id, to_meas point
from road_events ee
)
, newest_part as (
select e.event_id
, e.road_id
, e.year
, e.total_road_length
, p.point
, LAG(e.event_id) over (partition by p.road_id order by p.point) prev_event
, e.to_meas event_to_meas
from part_begin_point p
join road_events e
on p.road_id = e.road_id
and p.point >= e.from_meas and p.point < e.to_meas
and not exists(
select 1 from road_events ne
where e.road_id = ne.road_id
and p.point >= ne.from_meas and p.point < ne.to_meas
and (e.year < ne.year or e.year = ne.year and e.event_id < ne.event_id))
)
select event_id, road_id, year
, point from_meas
, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) to_meas
, total_road_length
, LEAD(point, 1, event_to_meas) over (partition by road_id order by point) - point EVENT_LENGTH
from newest_part
where 1=1
and event_id <> prev_event or prev_event is null
order by event_id, point
答案 2 :(得分:4)
今天对此考虑得太多了,但我现在忽略了+/- 10米。
首先创建了一个函数,该函数以成对的形式进入/从成对中获取字符串,并返回该字符串中成对所覆盖的距离。例如,“ 10:20; 35:45”返回20。
CREATE
OR replace FUNCTION get_distance_range_str (strRangeStr VARCHAR2)
RETURN NUMBER IS intRetNum NUMBER;
BEGIN
--split input string
WITH cte_1
AS (
SELECT regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) AS TO_FROM_STRING
FROM dual connect BY regexp_substr(strRangeStr, '[^;]+', 1, LEVEL) IS NOT NULL
)
--split From/To pairs
,cte_2
AS (
SELECT cte_1.TO_FROM_STRING
,to_number(substr(cte_1.TO_FROM_STRING, 1, instr(cte_1.TO_FROM_STRING, ':') - 1)) AS FROM_MEAS
,to_number(substr(cte_1.TO_FROM_STRING, instr(cte_1.TO_FROM_STRING, ':') + 1, length(cte_1.TO_FROM_STRING) - instr(cte_1.TO_FROM_STRING, ':'))) AS TO_MEAS
FROM cte_1
)
--merge ranges
,cte_merge_ranges
AS (
SELECT s1.FROM_MEAS
,
--t1.TO_MEAS
MIN(t1.TO_MEAS) AS TO_MEAS
FROM cte_2 s1
INNER JOIN cte_2 t1 ON s1.FROM_MEAS <= t1.TO_MEAS
AND NOT EXISTS (
SELECT *
FROM cte_2 t2
WHERE t1.TO_MEAS >= t2.FROM_MEAS
AND t1.TO_MEAS < t2.TO_MEAS
)
WHERE NOT EXISTS (
SELECT *
FROM cte_2 s2
WHERE s1.FROM_MEAS > s2.FROM_MEAS
AND s1.FROM_MEAS <= s2.TO_MEAS
)
GROUP BY s1.FROM_MEAS
)
SELECT sum(TO_MEAS - FROM_MEAS) AS DISTANCE_COVERED
INTO intRetNum
FROM cte_merge_ranges;
RETURN intRetNum;
END;
然后编写此查询,以为该函数在适当的先前范围内构建字符串。无法与list_agg一起使用窗口,但能够通过相关的子查询实现相同的功能。
--use list agg to create list of to/from pairs for rows before current row in the ordering
WITH cte_2
AS (
SELECT T1.*
,(
SELECT LISTAGG(FROM_MEAS || ':' || TO_MEAS || ';') WITHIN
GROUP (
ORDER BY ORDER BY YEAR DESC, EVENT_ID DESC
)
FROM road_events T2
WHERE T1.YEAR || lpad(T1.EVENT_ID, 10,'0') <
T2.YEAR || lpad(T2.EVENT_ID, 10,'0')
AND T1.ROAD_ID = T2.ROAD_ID
GROUP BY road_id
) AS PRIOR_RANGES_STR
FROM road_events T1
)
--get distance for prior range string - distance ignoring current row
--get distance including current row
,cte_3
AS (
SELECT cte_2.*
,coalesce(get_distance_range_str(PRIOR_RANGES_STR), 0) AS DIST_PRIOR
,get_distance_range_str(PRIOR_RANGES_STR || FROM_MEAS || ':' || TO_MEAS || ';') AS DIST_NOW
FROM cte_2 cte_2
)
--distance including current row less distance ignoring current row is distance added to the range this row
,cte_4
AS (
SELECT cte_3.*
,DIST_NOW - DIST_PRIOR AS DIST_ADDED_THIS_ROW
FROM cte_3
)
SELECT *
FROM cte_4
--filter out any rows with distance added as 0
WHERE DIST_ADDED_THIS_ROW > 0
ORDER BY ROAD_ID, YEAR DESC, EVENT_ID DESC
sqlfiddle此处:http://sqlfiddle.com/#!4/81331/36
我认为结果与您的相符。我在最终查询中保留了其他列,以尝试说明每个步骤。
在测试用例上工作-可能需要一些工作来处理较大数据集中的所有可能性,但是我认为这是一个开始和完善的好地方。
重叠范围合并的信用是这里的第一个答案:Merge overlapping date intervals
这里有带窗口的list_agg的信用是第一个答案: LISTAGG equivalent with windowing clause
答案 3 :(得分:4)
我对您的“道路事件”有疑问,因为您没有描述第一个prodaz.label4.TextChanged += ...
是什么,所以我认为这是介于0和1之间且没有1的时间段。
因此,您可以通过一个查询来计算:
meas
答案 4 :(得分:4)
解决方案:
SELECT RE.road_id, RE.event_id, RE.year, RE.from_meas, RE.to_meas, RE.road_length, RE.event_length, RE.used_length, RE.leftover_length
FROM
(
SELECT RE.C_road_id[road_id], RE.C_event_id[event_id], RE.C_year[year], RE.C_from_meas[from_meas], RE.C_to_meas[to_meas], RE.C_road_length[road_length],
RE.event_length, RE.used_length, (RE.event_length - (CASE WHEN RE.HasOverlap = 1 THEN RE.used_length ELSE 0 END))[leftover_length]
FROM
(
SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
(CASE WHEN MAX(RE.A_event_id) IS NOT NULL THEN 1 ELSE 0 END)[HasOverlap],
(RE.C_to_meas - RE.C_from_meas)[event_length],
SUM( (CASE WHEN RE.O_to_meas <= RE.C_to_meas THEN RE.O_to_meas ELSE RE.C_to_meas END)
- (CASE WHEN RE.O_from_meas >= RE.C_from_meas THEN RE.O_from_meas ELSE RE.C_from_meas END)
)[used_length]--This is the length that is already being counted towards later years.
FROM
(
SELECT RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
RE.A_event_id, MIN(RE.O_from_meas)[O_from_meas], MAX(RE.O_to_meas)[O_to_meas]
FROM
(
SELECT RE_C.road_id[C_road_id], RE_C.event_id[C_event_id], RE_C.year[C_year], RE_C.from_meas[C_from_meas], RE_C.to_meas[C_to_meas], RE_C.total_road_length[C_road_length],
RE_A.road_id[A_road_id], RE_A.event_id[A_event_id], RE_A.year[A_year], RE_A.from_meas[A_from_meas], RE_A.to_meas[A_to_meas], RE_A.total_road_length[A_road_length],
RE_O.road_id[O_road_id], RE_O.event_id[O_event_id], RE_O.year[O_year], RE_O.from_meas[O_from_meas], RE_O.to_meas[O_to_meas], RE_O.total_road_length[O_road_length],
(ROW_NUMBER() OVER (PARTITION BY RE_C.road_id, RE_C.event_id, RE_O.event_id ORDER BY RE_S.Overlap DESC, RE_A.event_id))[RowNum]--Use to Group Overlaps into Swaths.
FROM road_events as RE_C--Current.
LEFT JOIN road_events as RE_A--After. --Use a Left-Join to capture when there is only 1 Event (or it is the Last-Event in the list).
ON RE_A.road_id = RE_C.road_id
AND RE_A.event_id != RE_C.event_id--Not the same EventID.
AND RE_A.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_A.from_meas >= RE_C.from_meas AND RE_A.from_meas <= RE_C.to_meas)--There is Overlap.
OR (RE_A.to_meas >= RE_C.from_meas AND RE_A.to_meas <= RE_C.to_meas)--There is Overlap.
OR (RE_A.to_meas = RE_C.to_meas AND RE_A.from_meas = RE_C.from_meas)--They are Equal.
)
LEFT JOIN road_events as RE_O--Overlapped/Linked.
ON RE_O.road_id = RE_C.road_id
AND RE_O.event_id != RE_C.event_id--Not the same EventID.
AND RE_O.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas >= RE_A.from_meas AND RE_O.to_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas = RE_A.to_meas AND RE_O.from_meas = RE_A.from_meas)--They are Equal.
)
OUTER APPLY
(
SELECT COUNT(*)[Overlap]
FROM road_events as RE_O--Overlapped/Linked.
WHERE RE_O.road_id = RE_C.road_id
AND RE_O.event_id != RE_C.event_id--Not the same EventID.
AND RE_O.year >= RE_C.year--Occured on or After the Current Event.
AND ( (RE_O.from_meas >= RE_A.from_meas AND RE_O.from_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas >= RE_A.from_meas AND RE_O.to_meas <= RE_A.to_meas)--There is Overlap.
OR (RE_O.to_meas = RE_A.to_meas AND RE_O.from_meas = RE_A.from_meas)--They are Equal.
)
) AS RE_S--Swath of Overlaps.
) AS RE
WHERE RE.RowNum = 1--Remove Duplicates and Select those that are in the biggest Swaths.
GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length,
RE.A_event_id
) AS RE
GROUP BY RE.C_road_id, RE.C_event_id, RE.C_year, RE.C_from_meas, RE.C_to_meas, RE.C_road_length
) AS RE
) AS RE
WHERE RE.leftover_length > 0--Filter out Events that had their entire Segments overlapped by a Later Event(s).
ORDER BY RE.road_id, RE.year DESC, RE.event_id
SQL小提琴:
http://sqlfiddle.com/#!18/2880b/1
添加了规则/假设/说明:
1.)考虑到event_id
和road_id
可能是Guid的或无序创建的,
因此不要脚本假设较高或较低的值赋予记录关系的意义。
例如:
ID为1且ID为2不能保证ID 2为最新的ID(反之亦然)。
这样一来,解决方案将更加通用,减少“ hacky”。
2.)筛选出其整个细分市场与后来的事件重叠的事件。
例如:
如果2008年使用20-50,而2009年使用10-60,
那么2008年的活动将被滤除,因为其整个细分市场在2009年都已取消。
其他测试数据:
为了确保解决方案不仅仅针对给定的DataSet,
我已经在原始数据集中添加了一个{strong> 6 的road_id
,以便处理更多附带情况。
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (16,6,2012,0,100,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (17,6,2013,68,69,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (18,6,2014,65,66,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (19,6,2015,62,63,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (20,6,2016,50,60,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (21,6,2017,30,40,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (22,6,2017,20,55,100);
INSERT INTO road_events (event_id, road_id, year, from_meas, to_meas, total_road_length) VALUES (23,6,2018,0,25,100);
数据库版本:
此解决方案与Oracle和SQL Server无关:
它应该同时在 SS2008 +和 Oracle 12c +中工作。
此问题带有 Oracle 12c 标记,但是如果没有注册,就没有在线小提琴可以使用,
因此我在SQL Server中对其进行了测试-但相同的语法在这两者中均应起作用。
我大部分查询都依靠Cross-Apply
和Outer-Apply
。
Oracle在12c中引入了这些“联接”:
https://oracle-base.com/articles/12c/lateral-inline-views-cross-apply-and-outer-apply-joins-12cr1
简洁高效的:
它使用:
•没有相关的子查询。
•无递归。
•没有CTE。
•没有工会。
•没有用户功能。
索引:
我读过您对索引提出的评论之一。
我会为要搜索和分组的每个主要字段添加一列索引:
road_id
,event_id
和year
。
您可以查看该索引是否对您有帮助(我不知道您打算如何使用数据):
关键字段::road_id
,event_id
,year
包含:from_meas
,to_meas
标题:
您可能需要考虑将该问题的标题重命名为更可搜索的名称,例如:
“ 聚合重叠段以测量有效长度”。
这样可以更轻松地找到解决方案,以帮助遇到类似问题的其他人。
其他想法:
这样的事情对于汇总花费在某件事上的总时间很有用
重叠的开始和停止时间戳记。
答案 5 :(得分:3)
此发现将扩展表格以为每条道路的每英里产生一行,并且只用MAX
年。我们就可以COUNT
来生成event_length的行数。
它生成的表与您上面指定的完全相同。
注意:我针对SQL Server运行了此查询。我认为您可以在Oracle中使用LEAST
代替SELECT MIN(event_length) FROM (VALUES...)
。
WITH NumberRange(result) AS
(
SELECT 0
UNION ALL
SELECT result + 1
FROM NumberRange
WHERE result < 301 --Max length of any road
),
CurrentRoadEventLength(road_id, [year], event_length) AS
(
SELECT road_id, [year], COUNT(*) AS event_length
FROM (
SELECT re.road_id, n.result, MAX(re.[year]) as [year]
FROM road_events re INNER JOIN NumberRange n
ON ( re.from_meas <= n.result
AND re.to_meas > n.result
)
GROUP BY re.road_id, n.result
) events_per_mile
GROUP BY road_id, [year]
)
SELECT re.event_id, re.road_id, re.[year], re.total_road_length,
(SELECT MIN(event_length) FROM (VALUES (re.to_meas - re.from_meas), (cre.event_length)) AS EventLengths(event_length))
FROM road_events re INNER JOIN CurrentRoadEventLength cre
ON ( re.road_id = cre.road_id
AND re.[year] = cre.[year]
)
ORDER BY re.event_id, re.road_id
OPTION (MAXRECURSION 301) --Max length of any road