左连接字符串函数?

时间:2017-11-29 18:24:22

标签: sql google-bigquery

我需要加入两个表:

  • 一张桌子有电子邮件
  • 另一张表是域名黑名单。

我做了类似的事情:

SELECT
   CASE 
      WHEN b.domain IS NULL then "Invalid"
      ELSE "Valid"
   END as Validated
FROM Emails e
LEFT JOIN DomainBlacklist b
ON ENDS_WITH(LOWER(e.email), LOWER(b.domain))

但是却引发了一个错误:

"如果没有来自连接两边的字段相等的条件,则不能使用LEFT OUTER JOIN。"

有人知道我该如何解决这个问题?

谢谢!

2 个答案:

答案 0 :(得分:2)

理论上应该可以将此表达为具有相等性的联接;您需要先从电子邮件地址中删除@

SELECT
   CASE 
      WHEN b.domain IS NULL then "Invalid"
      ELSE "Valid"
   END as Validated
FROM Emails e
LEFT JOIN DomainBlacklist b
ON LOWER(SPLIT(e.email, '@')[SAFE_OFFSET(1)]) = LOWER(b.domain)

使用样本数据:

WITH Emails AS (
  SELECT 'elliott@example.com' AS email UNION ALL
  SELECT 'a@b.com' UNION ALL
  SELECT 'invalid_email' UNION ALL
  SELECT 'foo@bar.com'
), DomainBlacklist AS (
  SELECT 'example.com' AS domain UNION ALL
  SELECT 'bar.com'
)
SELECT
   CASE 
      WHEN b.domain IS NULL then "Invalid"
      ELSE "Valid"
   END as Validated
FROM Emails e
LEFT JOIN DomainBlacklist b
ON LOWER(SPLIT(e.email, '@')[SAFE_OFFSET(1)]) = LOWER(b.domain)

答案 1 :(得分:2)

以下是BigQuery Standard SQL

#standardSQL
SELECT email, 
  IF(MAX(ENDS_WITH(LOWER(email), LOWER(domain))), 'invalid', 'valid') AS Validated
FROM `project.dataset.Emails`
CROSS JOIN `project.dataset.DomainBlacklist`
GROUP BY email 

您可以使用虚拟数据测试/播放上述查询,如下所示

#standardSQL
WITH `project.dataset.Emails` AS (
  SELECT email
  FROM UNNEST(['user1@abc.com','user2@abc.com','user3@uvw.com','user4@xyz.com']) AS email 
), `project.dataset.DomainBlacklist` AS (
  SELECT domain
  FROM UNNEST(['uvw.com','qwe.net']) AS domain
)
SELECT email, 
  IF(MAX(ENDS_WITH(LOWER(email), LOWER(domain))), 'invalid', 'valid') AS Validated
FROM `project.dataset.Emails`
CROSS JOIN `project.dataset.DomainBlacklist`
GROUP BY email 

结果是

email           Validated    
user1@abc.com   valid    
user2@abc.com   valid    
user3@uvw.com   invalid  
user4@xyz.com   valid