需要使用正则表达式进行URL检查吗?

时间:2019-05-26 18:43:10

标签: sql regex google-bigquery gcloud

我需要一个用于字符串的正则表达式。

我的URL字符串之类的敌人

https冒号//字符串点字符串/字符串(之间不包含空格)

2 个答案:

答案 0 :(得分:0)

来自https://gist.github.com/jacksonfdam/3000275 我发现了:

^http(s)?:\/\/((\d+\.\d+\.\d+\.\d+)|(([\w-]+\.)+([a-z,A-Z][\w-]*)))(:[1-9][0-9]*)?(\/([\w-.\/:%+@&=]+[\w- .\/?:%+@&=]*)?)?(#(.*))?$/i

答案 1 :(得分:0)

以下BigQuery标准SQL示例

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'check this link http://www.example.com/products?id=1&page=2' tweet UNION ALL
  SELECT 'http://www.example.com/products?id=1&page=2 this link is awesome' tweet UNION ALL
  SELECT 'the link http://www.example.com/products?id=1&page=2 is awesome' tweet 

)
SELECT REGEXP_REPLACE(tweet, r"(?:http(s)?:\/\/)?[\w.-]+(?:\.[\w\.-]+)+[\w\-\._~:/?#[\]@!\$&'\(\)\*\+,;=.]+", '') clean_tweet
FROM `project.dataset.table`  

有结果

Row clean_tweet  
1   check this link  
2   this link is awesome     
3   the link is awesome