我在oracle数据库中有4个表,它们具有复杂的关系,并且没有有用的主键。
表A
+------+------+------+------+------+-----------------+
| ColA | ColX | ColY | ColZ | ColZa| A |
+------+------+------+------+------+-----------------+
| k9 | a1 | c1 | g1 | z1 | 2018-02-19 |
| k9 | a1 | c1 | g3 | z2 | 2018-02-02 |
| k10 | a2 | f3 | g1 | z3 | 2018-02-09 |
| k10 | a | b | c | d | 2018-02-03 |
| k | a | b | c1 | z2 | 2018-02-01 |
| k9 | a1 | c1 | c9 | z5 | 2018-02-04 |
| k9 | a1 | c1 | c2 | z5 | 2018-02-03 |
| k9 | a1 | c1 | g2 | z5 | 2018-02-03 |
+------+------+------+------+------+-----------------+
表B
+------+------+------+------+------+----------------+
| ColA | ColX | ColY | ColZ | ColZa| B |
+------+------+------+------+------+----------------+
| e | a3 | f | g1 | i | 2018-02-03 |
| e3 | a1 | f1 | g3 | d2 | 2018-02-04 |
| k9 | a1 | c1 | g2 | z5 | 2018-02-08 |
| e4 | a4 | f2 | g2 | i2 | 2018-02-07 |
| e5 | a1 | f1 | g1 | d2 | 2018-02-06 |
| k9 | a1 | c1 | g1 | d2 | 2018-02-22 |
+------+------+------+------+------+----------------+
表C
+------+------+------+----------------+
| ColA | ColX | ColY | C |
+------+------+------+----------------+
| ab | c2 | c2 | cx |
| k9 | a1 | c1 | cy |
| cd | a2 | c3 | cy |
| ef | c2 | c4 | cz |
| ef | c2 | c2 | cz |
+------+------+------+----------------+
表D
+------+------+------+----------------+
| ColA | ColX | ColY | D |
+------+------+------+----------------+
| e | a | f | dx |
| e1 | a | a | dy |
| e2 | a1 | a1 | dz |
+------+------+------+----------------+
某些业务逻辑要求我选择并合并来自TableA
和TableB
的数据
问题:
对于伪密钥ColA_ColX_ColY具有值ColZ =' g1',并在ColA, ColX, ColY, ColZ, ColZa, A, B
上合并的情况,在TableA和/或TableB中获取记录ColA | ColX | ColY | ColZ | ColZa
。
我使用了“伪'这里因为它不是真正的关键,但它只是识别TablesA和TablesB中感兴趣的记录的一种手段。
要构造一个有效的密钥,对于TableC和TableD中的colX,count(colY)必须为1(这实际上是所有四个表中的情况,但是如果你只考虑不同的值,但我想只使用TableC和TableD因为它更明确)
流程: 在下面的结果表中,我应该在TableA表中获得row1,因为' a1'在TableC中只有一个计数(ColY)= 1,但我忽略了TableB中的row1和TableA中的row3,因为TableC或TableD中的count(ColY)不等于1 现在我有一个价值' a1'从TableC.ColX符合我的标准,我选择TableA和TableB中的所有记录,其中ColX =' a1'和ColY =' c1'和ColA =' k9'
我想要的结果
+------+------+------+------+------+-----------------+----------------+
| ColA | ColX | ColY | ColZ | ColZa| A | B |
+------+------+------+------+------+-----------------+----------------|
| k9 | a1 | c1 | g1 | z1 | 2018-02-19 | [null] |
| k9 | a1 | c1 | g3 | z2 | 2018-02-02 | [null] |
| k9 | a1 | c1 | c9 | z5 | 2018-02-04 | [null] |
| k9 | a1 | c1 | c2 | z5 | 2018-02-03 | [null] |
| k9 | a1 | c1 | g2 | z5 | 2018-02-03 | 2018-02-08 |
| k9 | a1 | c1 | g4 | d2 | [null] | 2018-02-22 |
+------+------+------+------+------+-----------------+----------------+
所以,我写了一个类似于
的查询select a.ColX, a.ColY, a.ColZ, a.ColZa, a.A, b.B from TableA a FULL OUTER JOIN TableB b ON a.ColX=b.ColX AND a.ColY=b.ColY AND a.ColZ=b.ColZ
where (
a.ColX IN
(select ColX from TableA where
ColX IN
(select ColX from TableC group by ColX HAVING count(ColY)=1) and
ColX in
(select distinct ColX from TableB where ColZ = 'g1'and B > trunc(sysdate) - 365)
group by ColX having count(distinct ColY)=1)
OR
b.ColX IN
(select ColX from TableA where
ColX IN
(select ColX from TableC group by ColX HAVING count(ColY)=1) and
ColX in
(select distinct ColX from TableB where ColZ = 'g1' and B > trunc(sysdate) - 365)
group by ColX having count(distinct ColY)=1));
我无法控制数据模型。如何使我的查询工作? TableA和TableB中的数据为100,000条记录,TableC和TableD中的数据高达一百万条。
SQL不是我的专业领域,我真的希望我在这里不会太过分了。
答案 0 :(得分:1)
我不明白你的查询应该做什么,但作为一个纯粹的重构练习,我得到了这个:
with whatever as
( select colx
from tablea
where colx in
( select colx
from tablec
group by colx having count(colb) = 1
union all
select colx
from tableb
where colz = 'g1'
and b > trunc(sysdate) - 365 )
group by colx
having count(distinct colza) = 1 )
select a.colx, a.coly, a.colz, a.colza, a.a, b.b
from tablea a
full outer join tableb b
on a.colx = b.colx
and a.coly = b.coly
and a.colz = b.colz
join whatever w
on w.colx in (a.colx, b.colx);