我有两个表A和B,我必须在有条件的情况下对它执行左连接。 在大型查询或SQL中,有什么有效的方法可以做到这一点。
select * from table_A A
left join table_B B
where
[some condition OR some condition]
on
case1
A.column1 =B.column1
and A.column2= B.column2
and A.column3= B.column3
and A.column4= B.column4
and A.column5= B.column5
OR case2
A.column1 =B.column1
and A.column3= B.column3
and A.column4= B.column4
and A.column5= B.column5
OR case3
A.column1 =B.column1
and A.column2= B.column2
and A.column4= B.column4
OR case4
A.column1 =B.column1
and A.column3= B.column3
and A.column5= B.column5
在这里,我的主要动机是,如果我的case1匹配一行,那么它将不会进入其他案例。同样,如果第一个不匹配,它将工作,然后将检查第二个,然后是第三个,它将获得最佳匹配。 在这种情况下,这将有助于获得A和B表之间的连接100%。 在第一种情况下,我们将检查两个表的所有5个字段,但是如果某些字段为null,则它将检查其他情况,同样,它应该可以工作。
答案 0 :(得分:1)
如果我理解正确,SQL中的一般方法是多个left join
:
select a.*, coalesce(b1.col, b2.col, b3.col, b4.col) as col
from table_A A left join
table_B B1
on A.column1 = B1.column1 and
A.column2 = B1.column2 and
A.column3 = B1.column3 and
A.column4 = B1.column4 and
A.column5 = B1.column5 left join
table_b B2
on B1.column1 is null and
A.column1 = B2.column1 and
A.column3 = B2.column3 and
A.column4 = B2.column4 and
A.column5 = B2.column5 left join
table_b B3
on B2.column1 is null and
A.column1 = B3.column1 and
A.column2 = B3.column2 and
A.column3 = B3.column3 left join
table_b B4
on B3.column1 is null and
A.column2 = B4.column2 and
A.column4 = B4.column4
答案 1 :(得分:1)
您想要获得“最佳”匹配的B行。即如果有匹配情况1的行,则要坚持使用,但如果没有,则要尝试情况2,以此类推。
您可以做的是结合条件,以便首先加入所有可能的比赛。然后看一下比赛,除最佳比赛外,全部解散。排名可以通过RANK
完成。
select *
from
(
select
*,
rank() over (partition by A.id
order by
case when A.column2 = B.column2
and A.column3 = B.column3
and A.column4 = B.column4
and A.column5 = B.column5 then 1
when A.column3 = B.column3
and A.column4 = B.column4
and A.column5 = B.column5 then 2
when A.column2 = B.column2
and A.column4 = B.column4 then 3
else 4
end) as rnk
from table_A A
left join table_B B
on A.column1 = B.column1
and
(
(A.column2 = B.column2 and A.column4 = B.column4)
or
(A.column3 = B.column3 and A.column5 = B.column5)
)
where [some condition OR some condition]
) ranked
where rnk = 1;
(我的查询在table_A中采用了一些ID。如果您的表没有唯一ID,请使用任何唯一标识表中一行的列。)
答案 2 :(得分:0)
解决方案可以是使用临时数据存储(临时表,游标等),并使用参数化循环来馈送数据。您遇到的问题是,在纯SQL中没有循环。您必须使用bigQuery的脚本语言,在这里https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting
答案 3 :(得分:0)
我看到以下两个选项-都适用于BigQuery Standard SQL(感谢@ Thorsten-Kettner帮助您理解OP的逻辑/要求)
选项1 -针对每种情况分别进行联接;然后合并所有内容,最后为A中的每条记录选择获胜者
#standardSQL
SELECT * EXCEPT(priority, identity)
FROM (
SELECT AS VALUE ARRAY_AGG(t ORDER BY priority LIMIT 1)[OFFSET(0)]
FROM (
SELECT *, 1 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column2,column3,column4,column5) -- Case 1
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 2 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column3,column4,column5) -- Case 2
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 3 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column2,column4) -- Case 3
WHERE [SOME condition OR SOME condition]
UNION ALL
SELECT *, 4 priority, FORMAT('%t', A) identity
FROM table_A A LEFT JOIN table_B B
USING(column1,column3,column5) -- Case 4
WHERE [SOME condition OR SOME condition]
) t
GROUP BY identity
)
选项1 -只需选择一个查询中的所有潜在候选者,即可即时计算出该条目属于哪种情况,最后为A中的每一行选择获胜者
#standardSQL
SELECT * EXCEPT(priority, identity)
FROM (
SELECT SELECT AS VALUE ARRAY_AGG(t ORDER BY priority LIMIT 1)[OFFSET(0)]
FROM (
SELECT A.*,
B.* EXCEPT(column1,column2,column3,column4,column5),
FORMAT('%t', A) identity
CASE
WHEN (A.column1,A.column2,A.column3,A.column4,A.column5) = (B.column1,B.column2,B.column3,B.column4,B.column5) THEN 1
WHEN (A.column1,A.column3,A.column4,A.column5) = (B.column1,B.column3,B.column4,B.column5) THEN 2
WHEN (A.column1,A.column2,A.column4) = (B.column1,B.column2,B.column4) THEN 3
WHEN (A.column1,A.column3,A.column5) = (B.column1,B.column3,B.column5) THEN 4
ELSE 5
END AS priority,
FROM table_A A LEFT JOIN table_B B
ON A.column1 = B.column1
OR A.column2 = B.column2
OR A.column3 = B.column3
OR A.column4 = B.column4
OR A.column5 = B.column5
WHERE [SOME condition OR SOME condition]
) t
WHERE priority < 5
GROUP BY identity
)
注意:以上版本具有相似性,但同时又有所不同-优先选择一个还是另一个。还需要注意-上面未经过测试,只是即时编写,因此可能需要进行其他调整-但很可能不是:o)