如何解析Google BigQuery中的电子邮件地址

时间:2019-01-23 20:27:03

标签: google-bigquery

我在BigQuery中有一个如下表-

with temp as (
select "john.doe@company-Y.com" as email_id
union all 
select "hello.world@company-X.com" as email_id
)
select * from temp

我想从firstname, lastname, company字段中生成3个新列(email_id),以使输出为-

firstname, lastname, company
john         doe      company-Y
hello        world    company-X

哪个BigQuery函数可以用于此?

1 个答案:

答案 0 :(得分:3)

以下是用于BigQuery标准SQL

执行此操作的方法确实太多,以下是一种快速的方法(首先想到的是):

#standardSQL
WITH temp AS (
  SELECT "john.doe@company-Y.com" AS email_id UNION ALL 
  SELECT "hello.world@company-X.com" AS email_id
)
SELECT 
  SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(0)], '.')[SAFE_OFFSET(0)] firstname,
  SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(0)], '.')[SAFE_OFFSET(1)] lastname,
  SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(1)], '.')[SAFE_OFFSET(0)] company
FROM temp    

有结果

Row firstname   lastname    company  
1   john        doe         company-Y    
2   hello       world       company-X      

但是正确的解决方案将取决于数据的性质和模式以及明显的个人喜好等。

另一个快速选择是

#standardSQL
WITH temp AS (
  SELECT "john.doe@company-Y.com" AS email_id UNION ALL 
  SELECT "hello.world@company-X.com" AS email_id 
)
SELECT 
  REGEXP_EXTRACT(email_id, r'^(.*?)[.@]') firstname,
  REGEXP_EXTRACT(email_id, r'\.(.*?)@') lastname,
  REGEXP_EXTRACT(email_id, r'@(.*?)\.') company
FROM temp

结果相同

只需稍微扩展一下-这样您就可以看到改进的方向-例如,如果名称用.-

分隔开
#standardSQL
WITH temp AS (
  SELECT "john.doe@company-Y.com" AS email_id UNION ALL 
  SELECT "hello.world@company-X.com" AS email_id UNION ALL
  SELECT "hello-world@company-X.com" AS email_id UNION ALL
  SELECT "hello@company-X.com" AS email_id
)
SELECT email_id,
  REGEXP_EXTRACT(email_id, r'^(.*?)[-.@]') firstname,
  REGEXP_EXTRACT(email_id, r'[-.](.*?)@') lastname,
  REGEXP_EXTRACT(email_id, r'@(.*?)\.') company
FROM temp

有结果

Row email_id                    firstname   lastname    company  
1   john.doe@company-Y.com      john        doe         company-Y    
2   hello.world@company-X.com   hello       world       company-X    
3   hello-world@company-X.com   hello       world       company-X    
4   hello@company-X.com         hello       null        company-X