在处于ON条件的情况下联接两个表

时间:2020-09-03 15:42:24

标签: sql google-bigquery left-join

我有两个表A和B,我必须在有条件的情况下对它执行左连接。 在大型查询或SQL中,有什么有效的方法可以做到这一点。

select * from table_A A
left join table_B B 

where 
[some condition OR some condition]

on   

case1       
A.column1 =B.column1
 and A.column2= B.column2
and A.column3= B.column3
and A.column4= B.column4
and A.column5= B.column5


OR case2     
A.column1 =B.column1
and A.column3= B.column3
and A.column4= B.column4
and A.column5= B.column5

OR case3   
A.column1 =B.column1
and A.column2= B.column2
and A.column4= B.column4

OR case4  
A.column1 =B.column1
and A.column3= B.column3
and A.column5= B.column5

在这里,我的主要动机是,如果我的case1匹配一行,那么它将不会进入其他案例。同样,如果第一个不匹配,它将工作,然后将检查第二个,然后是第三个,它将获得最佳匹配。 在这种情况下,这将有助于获得A和B表之间的连接100%。 在第一种情况下,我们将检查两个表的所有5个字段,但是如果某些字段为null,则它将检查其他情况,同样,它应该可以工作。

4 个答案:

答案 0 :(得分:1)

如果我理解正确,SQL中的一般方法是多个left join

select a.*, coalesce(b1.col, b2.col, b3.col, b4.col) as col
from table_A A left join
     table_B B1
     on A.column1 = B1.column1 and
        A.column2 = B1.column2 and
        A.column3 = B1.column3 and
        A.column4 = B1.column4 and
        A.column5 = B1.column5 left join
     table_b B2
     on B1.column1 is null and
        A.column1 = B2.column1 and
        A.column3 = B2.column3 and
        A.column4 = B2.column4 and
        A.column5 = B2.column5 left join
     table_b B3
     on B2.column1 is null and
        A.column1 = B3.column1 and
        A.column2 = B3.column2 and
        A.column3 = B3.column3 left join
     table_b B4
     on B3.column1 is null and
        A.column2 = B4.column2 and
        A.column4 = B4.column4 

答案 1 :(得分:1)

您想要获得“最佳”匹配的B行。即如果有匹配情况1的行,则要坚持使用,但如果没有,则要尝试情况2,以此类推。

您可以做的是结合条件,以便首先加入所有可能的比赛。然后看一下比赛,除最佳比赛外,全部解散。排名可以通过RANK完成。

select *
from 
(
  select 
    *,
    rank() over (partition by A.id
                 order by
                   case when A.column2 = B.column2
                         and A.column3 = B.column3
                         and A.column4 = B.column4
                         and A.column5 = B.column5 then 1
                        when A.column3 = B.column3
                         and A.column4 = B.column4
                         and A.column5 = B.column5 then 2
                        when A.column2 = B.column2
                         and A.column4 = B.column4 then 3
                                                   else 4
                   end) as rnk
  from table_A A
  left join table_B B 
    on A.column1 = B.column1
    and
    ( 
      (A.column2 = B.column2 and A.column4 = B.column4)
     or
      (A.column3 = B.column3 and A.column5 = B.column5)
    )
  where [some condition OR some condition]
) ranked
where rnk = 1;

(我的查询在table_A中采用了一些ID。如果您的表没有唯一ID,请使用任何唯一标识表中一行的列。)

答案 2 :(得分:0)

解决方案可以是使用临时数据存储(临时表,游标等),并使用参数化循环来馈送数据。您遇到的问题是,在纯SQL中没有循环。您必须使用bigQuery的脚本语言,在这里https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting

答案 3 :(得分:0)

我看到以下两个选项-都适用于BigQuery Standard SQL(感谢@ Thorsten-Kettner帮助您理解OP的逻辑/要求)

选项1 -针对每种情况分别进行联接;然后合并所有内容,最后为A中的每条记录选择获胜者

#standardSQL
SELECT * EXCEPT(priority, identity)
FROM (
  SELECT AS VALUE ARRAY_AGG(t ORDER BY priority LIMIT 1)[OFFSET(0)]
  FROM (
    SELECT *, 1 priority, FORMAT('%t', A) identity 
      FROM table_A A LEFT JOIN table_B B
      USING(column1,column2,column3,column4,column5) -- Case 1
      WHERE [SOME condition OR SOME condition]
    UNION ALL
    SELECT *, 2 priority, FORMAT('%t', A) identity 
      FROM table_A A LEFT JOIN table_B B
      USING(column1,column3,column4,column5) -- Case 2
      WHERE [SOME condition OR SOME condition]
    UNION ALL
    SELECT *, 3 priority, FORMAT('%t', A) identity 
      FROM table_A A LEFT JOIN table_B B
      USING(column1,column2,column4) -- Case 3
      WHERE [SOME condition OR SOME condition]
    UNION ALL
    SELECT *, 4 priority, FORMAT('%t', A) identity 
      FROM table_A A LEFT JOIN table_B B
      USING(column1,column3,column5) -- Case 4
      WHERE [SOME condition OR SOME condition]
  ) t
  GROUP BY identity
)    

选项1 -只需选择一个查询中的所有潜在候选者,即可即时计算出该条目属于哪种情况,最后为A中的每一行选择获胜者

#standardSQL
SELECT * EXCEPT(priority, identity)
FROM (
  SELECT SELECT AS VALUE ARRAY_AGG(t ORDER BY priority LIMIT 1)[OFFSET(0)]
  FROM (
    SELECT A.*, 
      B.* EXCEPT(column1,column2,column3,column4,column5),
      FORMAT('%t', A) identity 
      CASE
        WHEN (A.column1,A.column2,A.column3,A.column4,A.column5) = (B.column1,B.column2,B.column3,B.column4,B.column5) THEN 1
        WHEN (A.column1,A.column3,A.column4,A.column5) = (B.column1,B.column3,B.column4,B.column5) THEN 2
        WHEN (A.column1,A.column2,A.column4) = (B.column1,B.column2,B.column4) THEN 3
        WHEN (A.column1,A.column3,A.column5) = (B.column1,B.column3,B.column5) THEN 4
        ELSE 5
      END AS priority, 
    FROM table_A A LEFT JOIN table_B B
    ON A.column1 = B.column1 
    OR A.column2 = B.column2 
    OR A.column3 = B.column3 
    OR A.column4 = B.column4 
    OR A.column5 = B.column5 
    WHERE [SOME condition OR SOME condition]
  ) t
  WHERE priority < 5 
  GROUP BY identity
)   

注意:以上版本具有相似性,但同时又有所不同-优先选择一个还是另一个。还需要注意-上面未经过测试,只是即时编写,因此可能需要进行其他调整-但很可能不是:o)