这与问题SQL Statement for Reconciliation非常相关,但有更多的转折。
鉴于以下架构:
create table TBL1 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp);
create table TBL2 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp);
create table TBL_RESULT (ID varchar2(100) primary key not null, TBL1_ID varchar2(100), TBL2_ID varchar2(100));
create unique index UK_TBL_RESULT_TBL1_ID on TBL_RESULT(TBL1_ID);
create unique index UK_TBL_RESULT_TBL2_ID on TBL_RESULT(TBL2_ID);
insert into TBL1 VALUES('1', to_date('01/26/2012 20:00:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into TBL1 VALUES('2', to_date('01/26/2012 20:05:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into TBL2 VALUES('3', to_date('01/26/2012 19:59:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into TBL2 VALUES('4', to_date('01/26/2012 20:04:00', 'mm/dd/yyyy hh24:mi:ss'));
我们当前的查询:
INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID)
SELECT rawtohex(sys_guid()),t1.id,t2.id
FROM
(SELECT t1.match_criteria,t1.id, row_number() OVER (PARTITION BY t1.match_criteria ORDER BY t1.id) rn
FROM tbl1 t1) t1,
(SELECT t2.match_criteria,t2.id, row_number() OVER (PARTITION BY t2.match_criteria ORDER BY t2.id) rn
FROM tbl2 t2) t2
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440)
AND t1.rn=t2.rn
输出结果:
| ID | TBL1_ID | TBL2_ID |
| '1' | '1' | '3' |
| '2' | '1' | '4' |
| '3' | '2' | '3' |
| '4' | '2' | '4' |
如您所见,结果不符合唯一约束(重复TBL1_ID /重复TBL2_ID)。这是因为:
我们期待的输出看起来如下表所示:
| ID | TBL1_ID | TBL2_ID |
| '1' | '1' | '4' |
| '2' | '2' | '3' |
注1:“1”与“3”匹配无关紧要,但“2”应与“4”匹配以符合约束条件且只要T1.MATCH_CRITERIA在10以内分钟T2.MATCH_CRITERIA。
注2:我们从TBL1中插入了一百万条记录,从TBL2中插入了另外一百万条记录。因此,使用PL / SQL的顺序插入是不可接受的,除非它可以非常快地运行(少于15分钟)。
注3:应消除不匹配的数据。预计数据也会出现不平衡。
注4:我们不仅限于执行1个查询。一系列有限查询都可以。
答案 0 :(得分:1)
当您的查询产生交叉连接时,因为您的业务规则无法提供一种机制来将T1中的一条记录与T2中的一条记录链接起来。鉴于这显然是一个玩具示例,我们很难提出除了非常简单的事情之外的其他任何事情:
(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria,t1.id) rn
....
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria,t2.id) rn
这将简单地将T1结果集中的第一行与T2结果集中的第一行匹配,T1结果集中的第二行与T2结果集中的第二行匹配,依此类推。
SQL> INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID)
SELECT seq_tbl_result.nextval,t1.id,t2.id
FROM
(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria, t1.id) rn
FROM tbl1 t1) t1,
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria, t2.id) rn
FROM tbl2 t2) t2
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440)
AND t1.rn=t2.rn
SQL> SQL> SQL> 2 3 4 5 6 7 8 9
10 /
2 rows created.
SQL> select * from tbl_result
2 /
ID TBL1_I TBL2_I
------ ------ ------
9 1 3
10 2 4
SQL>
这可能不是你想要的。在这种情况下,您需要解释您的数据和规则,以决定与什么链接。例如,是否存在某种时间的某种模式,这将允许您导出锚点?
顺便说一句,当我统治世界时,使用VARCHAR2(100)列来保存数字ID的人将被拍摄。
答案 1 :(得分:1)
我认为这可行:
INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID)
select seq_tbl_result.nextval,
tt1.id, tt2.id
from (select id, v, row_number() over(partition by v order by id) rn
from (select distinct t1.id,
case
when (t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440)) then
1
else
2
end v
from tbl1 t1, tbl2 t2
where t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440))) tt1,
(select id, v, row_number() over(partition by v order by id) rn
from (select distinct t2.id,
case
when (t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440)) then
1
else
2
end v
from tbl1 t1, tbl2 t2
where t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440))) tt2
where tt1.v = tt2.v
and tt1.rn = tt2.rn