我在BigQuery中有一个如下表-
with temp as (
select "john.doe@company-Y.com" as email_id
union all
select "hello.world@company-X.com" as email_id
)
select * from temp
我想从firstname, lastname, company
字段中生成3个新列(email_id
),以使输出为-
firstname, lastname, company
john doe company-Y
hello world company-X
哪个BigQuery函数可以用于此?
答案 0 :(得分:3)
以下是用于BigQuery标准SQL
执行此操作的方法确实太多,以下是一种快速的方法(首先想到的是):
#standardSQL
WITH temp AS (
SELECT "john.doe@company-Y.com" AS email_id UNION ALL
SELECT "hello.world@company-X.com" AS email_id
)
SELECT
SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(0)], '.')[SAFE_OFFSET(0)] firstname,
SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(0)], '.')[SAFE_OFFSET(1)] lastname,
SPLIT(SPLIT(email_id, '@')[SAFE_OFFSET(1)], '.')[SAFE_OFFSET(0)] company
FROM temp
有结果
Row firstname lastname company
1 john doe company-Y
2 hello world company-X
但是正确的解决方案将取决于数据的性质和模式以及明显的个人喜好等。
另一个快速选择是
#standardSQL
WITH temp AS (
SELECT "john.doe@company-Y.com" AS email_id UNION ALL
SELECT "hello.world@company-X.com" AS email_id
)
SELECT
REGEXP_EXTRACT(email_id, r'^(.*?)[.@]') firstname,
REGEXP_EXTRACT(email_id, r'\.(.*?)@') lastname,
REGEXP_EXTRACT(email_id, r'@(.*?)\.') company
FROM temp
结果相同
只需稍微扩展一下-这样您就可以看到改进的方向-例如,如果名称用.
或-
#standardSQL
WITH temp AS (
SELECT "john.doe@company-Y.com" AS email_id UNION ALL
SELECT "hello.world@company-X.com" AS email_id UNION ALL
SELECT "hello-world@company-X.com" AS email_id UNION ALL
SELECT "hello@company-X.com" AS email_id
)
SELECT email_id,
REGEXP_EXTRACT(email_id, r'^(.*?)[-.@]') firstname,
REGEXP_EXTRACT(email_id, r'[-.](.*?)@') lastname,
REGEXP_EXTRACT(email_id, r'@(.*?)\.') company
FROM temp
有结果
Row email_id firstname lastname company
1 john.doe@company-Y.com john doe company-Y
2 hello.world@company-X.com hello world company-X
3 hello-world@company-X.com hello world company-X
4 hello@company-X.com hello null company-X