Question

想象一下，我是一家根据客户名称销售星座运势的公司。我的桌子上有姓，姓和星座文字。由于我无法涵盖所有的单个名称组合，因此我经常将姓氏存储为NULL，作为一个包含所有内容的值。

Horoscope DB

sur | fam | horoscope
----------------------
John| Doe  | text1
Jane| Doe  | text2
NULL| Doe  | text3
Ike | Smith| text4
NULL| Smith| text5

以及客户列表

customer DB

sur | fam
---------
John| Doe
Jack| Doe
Lisa| Smith
Carl| Smith

现在，我们需要为每个客户匹配一个星座运势。如果我们在姓氏和姓氏上有完全匹配，则在这两者上都匹配，但是如果我们没有姓氏和姓氏，则仅在姓氏上匹配，因此结果为：

Customer horoscope DB

sur | fam | horoscope
----------------------
John| Doe  | text1
Jack| Doe  | text3
Lisa| Smith| text5
Carl| Smith| text5

如果我做普通的LEFT JOIN USING(sur, fam)，我只会和约翰有一场比赛。如果我使用LEFT JOIN USING(fam)，将会得到很多重复。我需要设置一些条件，但不确定如何。

如果愿意，我愿意更改我的包罗万象的值，或将其编码为单独的列。

特别是我正在使用Google Big Query。 I have set up a DB-fiddle, that you are welcome to use

Answer 1

这是一种方法：

select . . .
from (select c.*,
             h.* except (sur, fam), -- whatever columns you want
             row_number() over (partition by c.fam
                                order by (case when c.sur = h.sur then 1 else 2 end)
                               ) as seqnum
      from horoscope h join
           customer c
           on c.fam = h.fam
     ) ch
where seqnum = 1;

基本上，这是家庭成员，然后选择“最佳匹配”，即姓氏上的完全匹配。

但是，您应该小心，因为不同的家庭可以使用相同的姓氏。

Answer 2

另一种解决方案是使用条件聚合。您可以加入姓氏，然后检查给定姓氏是否存在星座运势；如果没有，则回退到NULL姓氏。

SELECT
    c.sur,
    c.fam,
    COALESCE(
        MAX(CASE WHEN c.sur = h.sur THEN h.text END),
        MAX(CASE WHEN h.sur IS NULL THEN h.text END) 
    ) horoscope_text
FROM
    customer c
    INNER JOIN horoscope h ON c.fam = h.fam
GROUP BY 
    c.sur,
    c.fam

Answer 3

根据我的理解，这是一种实现方法

select c.id customer_id, c.sur, c.fam, h.id horoscope_id, h.sur h_sur, 
h.fam h_fam, h.horoscope
FROM customer c join horoscope h
on (c.sur = h.sur and c.fam = h.fam)
or (h.sur is null and c.fam = h.fam and not exists 
      (select 1 from horoscope h1 where h1.sur = c.sur and h1.fam = c.fam)
   )

和结果

Answer 4

您可以在多种条件下加入以涵盖每种情况：

select c.sur, c.fam, h.horoscope from customer c 
inner join horoscope h
on (c.fam = h.fam and c.sur = h.sur) or 
  (c.fam = h.fam and h.sur is null and not exists(
    select 1 from horoscope 
    where fam = c.fam and sur = c.sur
  )
)

请参见demo

Answer 5

以下是用于BigQuery标准SQL

#standardSQL
SELECT c.sur, c.fam,
  ARRAY_AGG(horoscope ORDER BY h.sur DESC LIMIT 1)[OFFSET(0)] horoscope
FROM `project.dataset.customer` c
JOIN `project.dataset.horoscope` h
ON c.fam = h.fam
AND c.sur = IFNULL(h.sur, c.sur)
GROUP BY c.sur, c.fam

您可以使用下面示例中的示例数据来测试，玩转

#standardSQL
WITH `project.dataset.horoscope` AS (
  SELECT 'John' sur,'Doe' fam, 'text1' horoscope UNION ALL
  SELECT 'Jane', 'Doe', 'text2' UNION ALL
  SELECT NULL, 'Doe', 'text3' UNION ALL
  SELECT 'Ike', 'Smith', 'text4' UNION ALL
  SELECT NULL, 'Smith', 'text5' 
), `project.dataset.customer` AS (
  SELECT 'John' sur, 'Doe' fam UNION ALL
  SELECT 'Jack', 'Doe' UNION ALL
  SELECT 'Lisa', 'Smith' UNION ALL
  SELECT 'Carl', 'Smith' 
)
SELECT c.sur, c.fam,
  ARRAY_AGG(horoscope ORDER BY h.sur DESC LIMIT 1)[OFFSET(0)] horoscope
FROM `project.dataset.customer` c
JOIN `project.dataset.horoscope` h
ON c.fam = h.fam
AND c.sur = IFNULL(h.sur, c.sur)
GROUP BY c.sur, c.fam

有结果

Row sur     fam     horoscope    
1   John    Doe     text1    
2   Jack    Doe     text3    
3   Lisa    Smith   text5    
4   Carl    Smith   text5

SQL在通配符列上联接/如果表中的col1在col1和col2上联接，否则在col2上联接

5 个答案: