在BigQuery Standard SQL中的最后一个斜杠之后获取字符串

时间:2018-10-30 21:50:10

标签: sql regex google-bigquery

假设我有一列名为“ Youtube”,我想在URL的最后一个斜杠之后提取字符串。如何在BigQuery Standard SQL中做到这一点?

示例:

https://youtube.com/user/HaraldSchmidtShow

https://youtube.com/user/applesofficial

https://youtube.com/user/GrahamColton

基本上,我想要:

HaraldSchmidtShow

applesofficial

GrahamColton

3 个答案:

答案 0 :(得分:2)

这可能已经为您解决了问题:

WITH data AS(
  SELECT 'https://youtube.com/user/HaraldSchmidtShow' AS url UNION ALL
  SELECT 'https://youtube.com/user/applesofficial' UNION ALL
  SELECT 'https://youtube.com/user/GrahamColton'
)

SELECT
  SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(url, '/')) - 1)] AS name
FROM `data`

它只是分割字符串并获得最后一个值。

答案 1 :(得分:1)

上一个答案的替代方法,当末尾有一个“ /”时,该方法也适用:

expires_utc: 13243931062000000

答案 2 :(得分:1)

以下是用于BigQuery标准SQL

#standardSQL
SELECT url, 
  (SELECT v FROM UNNEST(SPLIT(url, '/')) v WITH OFFSET o 
    WHERE v != '' ORDER BY o DESC LIMIT 1
  ) last_string
FROM `data`  

您可以将虚拟数据用作

#standardSQL
WITH data AS(
  SELECT 'https://youtube.com/user/HaraldSchmidtShow' AS url UNION ALL
  SELECT 'https://youtube.com/user/applesofficial' UNION ALL
  SELECT 'https://youtube.com/user/GrahamColton/' UNION ALL
  SELECT 'youtube.com/channel/UCEDBbJXgUqRQXCOsluJJ0FQ'
)
SELECT url, 
  (SELECT v FROM UNNEST(SPLIT(url, '/')) v WITH OFFSET o 
    WHERE v != '' ORDER BY o DESC LIMIT 1
  ) last_string
FROM `data`

有结果

Row url                                             last_string  
1   https://youtube.com/user/HaraldSchmidtShow      HaraldSchmidtShow    
2   https://youtube.com/user/applesofficial         applesofficial   
3   https://youtube.com/user/GrahamColton/          GrahamColton     
4   youtube.com/channel/UCEDBbJXgUqRQXCOsluJJ0FQ    UCEDBbJXgUqRQXCOsluJJ0FQ     

很明显,像Felipe的答案一样使用正则表达式函数-更优雅,更易于阅读。
但是在某些情况下,使用上述方法仍然具有实用价值,因此我想将其介绍到该帖子中