正则表达式从文本中提取URL-Ruby

时间:2019-04-24 16:28:41

标签: ruby-on-rails regex ruby ruby-on-rails-4

我正在尝试从文本中检测urls,并用如下引号引起来替换它们:

original text: Hey, it is a url here www.example.com
required text: Hey, it is a url here "www.example.com"

original text显示我的输入值,required text代表所需的输出。我在网上进行了大量搜索,但找不到任何可能的解决方案。我已经尝试过URL.extract功能,但是如果没有URLshttp,似乎无法检测到https。以下是一些我想处理的网址示例。请让我知道您是否知道解决方案。

  

ANQUETIL-DUPERRON Abraham-Hyacinthe,基弗·让-卢克,www.hominides.net / html / actualites / outils-preuve-presence-hominides-asie-0422.php,莱斯贝勒斯莱特雷斯,2001年。

     

https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/

     

www.jstor.org/stable/24084454

     

www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde /

     

insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so

     

www.cerege.fr/spip.php?page=pageperso&id_user=94

1 个答案:

答案 0 :(得分:0)

查找类似于url的单词:

str = "ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.\n\nhttps://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/\n\nwww.jstor.org/stable/24084454\n\nwww.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/\n\ninsu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so\n\nwww.cerege.fr/spip.php?page=pageperso&id_user=94"

str.split.select{|w| w[/(\b+\.\w+)/]}

这将为您提供一系列不带空格的单词,并包含一个或多个.字符,这些字符可能适合您的用例。

puts str.split.select{|w| w[/(\b+\.\w+)/]}
www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94

已更新

完整的解决方案来修改您的字符串:

str_with_quote = str.clone # make a clone for the `gsub!`

str.split.select{|w| w[/(\b+\.\w+)/]}
   .each{|url| str_with_quote.gsub!(url, '"' + url + '"')} 

现在,您克隆的对象将网址包装在双引号中

puts str_with_quote

将为您提供此输出

ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, "www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les" Belles lettres, 2001.

"https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/"

"www.jstor.org/stable/24084454"

"www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/"

"insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so"

"www.cerege.fr/spip.php?page=pageperso&id_user=94"