Question

这与问题SQL Statement for Reconciliation非常相关，但有更多的转折。

鉴于以下架构：

create table TBL1 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp);
create table TBL2 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp);
create table TBL_RESULT (ID varchar2(100) primary key not null, TBL1_ID varchar2(100), TBL2_ID varchar2(100));

create unique index UK_TBL_RESULT_TBL1_ID on TBL_RESULT(TBL1_ID);
create unique index UK_TBL_RESULT_TBL2_ID on TBL_RESULT(TBL2_ID);

insert into TBL1 VALUES('1', to_date('01/26/2012 20:00:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into TBL1 VALUES('2', to_date('01/26/2012 20:05:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into TBL2 VALUES('3', to_date('01/26/2012 19:59:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into TBL2 VALUES('4', to_date('01/26/2012 20:04:00', 'mm/dd/yyyy hh24:mi:ss'));

我们当前的查询：

INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
SELECT rawtohex(sys_guid()),t1.id,t2.id 
FROM
(SELECT t1.match_criteria,t1.id, row_number() OVER (PARTITION BY t1.match_criteria ORDER BY t1.id) rn 
FROM tbl1 t1) t1,  
(SELECT t2.match_criteria,t2.id, row_number() OVER (PARTITION BY t2.match_criteria ORDER BY t2.id) rn 
FROM tbl2 t2) t2
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440)
AND t1.rn=t2.rn

输出结果：

| ID  |  TBL1_ID | TBL2_ID |
| '1' |  '1'     |    '3'  |
| '2' |  '1'     |    '4'  |
| '3' |  '2'     |    '3'  |
| '4' |  '2'     |    '4'  |

如您所见，结果不符合唯一约束（重复TBL1_ID /重复TBL2_ID）。这是因为：

每条记录的RN始终为1（因此始终相等）
两个记录之间的日期是10分钟。

我们期待的输出看起来如下表所示：

| ID  |  TBL1_ID | TBL2_ID |
| '1' |  '1'     |    '4'  |
| '2' |  '2'     |    '3'  |

注1：“1”与“3”匹配无关紧要，但“2”应与“4”匹配以符合约束条件且只要T1.MATCH_CRITERIA在10以内分钟T2.MATCH_CRITERIA。

注2：我们从TBL1中插入了一百万条记录，从TBL2中插入了另外一百万条记录。因此，使用PL / SQL的顺序插入是不可接受的，除非它可以非常快地运行（少于15分钟）。

注3：应消除不匹配的数据。预计数据也会出现不平衡。

注4：我们不仅限于执行1个查询。一系列有限查询都可以。

Answer 1

当您的查询产生交叉连接时，因为您的业务规则无法提供一种机制来将T1中的一条记录与T2中的一条记录链接起来。鉴于这显然是一个玩具示例，我们很难提出除了非常简单的事情之外的其他任何事情：

(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria,t1.id) rn 
.... 
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria,t2.id) rn

这将简单地将T1结果集中的第一行与T2结果集中的第一行匹配，T1结果集中的第二行与T2结果集中的第二行匹配，依此类推。

SQL> INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
SELECT seq_tbl_result.nextval,t1.id,t2.id 
FROM
(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria, t1.id) rn 
FROM tbl1 t1) t1,  
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria, t2.id) rn 
FROM tbl2 t2) t2
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440)
AND t1.rn=t2.rn
SQL> SQL> SQL>   2    3    4    5    6    7    8    9  
 10  /

2 rows created.


SQL> select * from tbl_result
  2  /

ID     TBL1_I TBL2_I
------ ------ ------
9      1      3
10     2      4

SQL>

这可能不是你想要的。在这种情况下，您需要解释您的数据和规则，以决定与什么链接。例如，是否存在某种时间的某种模式，这将允许您导出锚点？

顺便说一句，当我统治世界时，使用VARCHAR2（100）列来保存数字ID的人将被拍摄。

Answer 2

我认为这可行：

INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID)
select seq_tbl_result.nextval,
tt1.id, tt2.id
from (select id, v, row_number() over(partition by v order by id) rn
from (select distinct t1.id,
case
when (t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440)) then
1
else
2
end v
from tbl1 t1, tbl2 t2
where t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440))) tt1,
(select id, v, row_number() over(partition by v order by id) rn
from (select distinct t2.id,
case
when (t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440)) then
1
else
2
end v
from tbl1 t1, tbl2 t2
where t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440))) tt2
where tt1.v = tt2.v
and tt1.rn = tt2.rn

与不同运营商协调的SQL语句

2 个答案: