与不同运营商协调的SQL语句

时间:2012-01-26 12:43:47

标签: sql oracle insert-update

这与问题SQL Statement for Reconciliation非常相关,但有更多的转折。

鉴于以下架构:

create table TBL1 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp);
create table TBL2 (ID varchar2(100) primary key not null, MATCH_CRITERIA timestamp);
create table TBL_RESULT (ID varchar2(100) primary key not null, TBL1_ID varchar2(100), TBL2_ID varchar2(100));

create unique index UK_TBL_RESULT_TBL1_ID on TBL_RESULT(TBL1_ID);
create unique index UK_TBL_RESULT_TBL2_ID on TBL_RESULT(TBL2_ID);

insert into TBL1 VALUES('1', to_date('01/26/2012 20:00:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into TBL1 VALUES('2', to_date('01/26/2012 20:05:00', 'mm/dd/yyyy hh24:mi:ss'));

insert into TBL2 VALUES('3', to_date('01/26/2012 19:59:00', 'mm/dd/yyyy hh24:mi:ss'));
insert into TBL2 VALUES('4', to_date('01/26/2012 20:04:00', 'mm/dd/yyyy hh24:mi:ss'));

我们当前的查询:

INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
SELECT rawtohex(sys_guid()),t1.id,t2.id 
FROM
(SELECT t1.match_criteria,t1.id, row_number() OVER (PARTITION BY t1.match_criteria ORDER BY t1.id) rn 
FROM tbl1 t1) t1,  
(SELECT t2.match_criteria,t2.id, row_number() OVER (PARTITION BY t2.match_criteria ORDER BY t2.id) rn 
FROM tbl2 t2) t2
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440)
AND t1.rn=t2.rn

输出结果:

| ID  |  TBL1_ID | TBL2_ID |
| '1' |  '1'     |    '3'  |
| '2' |  '1'     |    '4'  |
| '3' |  '2'     |    '3'  |
| '4' |  '2'     |    '4'  |

如您所见,结果不符合唯一约束(重复TBL1_ID /重复TBL2_ID)。这是因为:

  1. 每条记录的RN始终为1(因此始终相等)
  2. 两个记录之间的日期是10分钟。
  3. 我们期待的输出看起来如下表所示:

    | ID  |  TBL1_ID | TBL2_ID |
    | '1' |  '1'     |    '4'  |
    | '2' |  '2'     |    '3'  |
    

    注1:“1”与“3”匹配无关紧要,但“2”应与“4”匹配以符合约束条件且只要T1.MATCH_CRITERIA在10以内分钟T2.MATCH_CRITERIA。

    注2:我们从TBL1中插入了一百万条记录,从TBL2中插入了另外一百万条记录。因此,使用PL / SQL的顺序插入是不可接受的,除非它可以非常快地运行(少于15分钟)。

    注3:应消除不匹配的数据。预计数据也会出现不平衡。

    注4:我们不仅限于执行1个查询。一系列有限查询都可以。

2 个答案:

答案 0 :(得分:1)

当您的查询产生交叉连接时,因为您的业务规则无法提供一种机制来将T1中的一条记录与T2中的一条记录链接起来。鉴于这显然是一个玩具示例,我们很难提出除了非常简单的事情之外的其他任何事情:

(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria,t1.id) rn 
.... 
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria,t2.id) rn 

这将简单地将T1结果集中的第一行与T2结果集中的第一行匹配,T1结果集中的第二行与T2结果集中的第二行匹配,依此类推。

SQL> INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID) 
SELECT seq_tbl_result.nextval,t1.id,t2.id 
FROM
(SELECT t1.match_criteria,t1.id, row_number() OVER (ORDER BY t1.match_criteria, t1.id) rn 
FROM tbl1 t1) t1,  
(SELECT t2.match_criteria,t2.id, row_number() OVER (ORDER BY t2.match_criteria, t2.id) rn 
FROM tbl2 t2) t2
WHERE t1.match_criteria between t2.match_criteria - (10/1440) AND t2.match_criteria + (10/1440)
AND t1.rn=t2.rn
SQL> SQL> SQL>   2    3    4    5    6    7    8    9  
 10  /

2 rows created.


SQL> select * from tbl_result
  2  /

ID     TBL1_I TBL2_I
------ ------ ------
9      1      3
10     2      4

SQL> 

这可能不是你想要的。在这种情况下,您需要解释您的数据和规则,以决定与什么链接。例如,是否存在某种时间的某种模式,这将允许您导出锚点?


顺便说一句,当我统治世界时,使用VARCHAR2(100)列来保存数字ID的人将被拍摄。

答案 1 :(得分:1)

我认为这可行:

INSERT INTO TBL_RESULT (ID, TBL1_ID, TBL2_ID)
select seq_tbl_result.nextval,
tt1.id, tt2.id
from (select id, v, row_number() over(partition by v order by id) rn
from (select distinct t1.id,
case
when (t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440)) then
1
else
2
end v
from tbl1 t1, tbl2 t2
where t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440))) tt1,
(select id, v, row_number() over(partition by v order by id) rn
from (select distinct t2.id,
case
when (t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440)) then
1
else
2
end v
from tbl1 t1, tbl2 t2
where t1.match_criteria between
t2.match_criteria - (10 / 1440) and
t2.match_criteria + (10 / 1440))) tt2
where tt1.v = tt2.v
and tt1.rn = tt2.rn