如何从ruby中的html主体中提取url

时间:2018-04-06 11:15:46

标签: ruby-on-rails ruby ruby-on-rails-4

我需要从HTML正文中提取URL,这是我的HTML正文

"<html><head><meta http-equiv=refresh content=0;URL=/ref.php?offer_id=350&aff_id=28&url=https%3A%2F%2Fplay.leadzuaf.com%2F%3Fm%3D0ENYJG473721%26offer_key%3D473721%26fc%3D1%26a%3Dy00704Hj50h1zF05XC0HZEp0Kpefss.%7Bpubid%7D%26pubid%3D28&urlauth=ab27ecac97f1760d912ad169b4af1e4b></head></html>"

如果我使用URI.extract(str)提取此字符串,我将得到一个空数组。请帮我解决这个问题

2 个答案:

答案 0 :(得分:1)

html = "<html><head><meta http-equiv=refresh content=0;URL=/ref.php?offer_id=..."
html[/(?<=URL=).*?(?=>)/]
#⇒ "/ref.php?offer_id=....."

答案 1 :(得分:0)

a="<html><head><meta http-equiv=refresh content=0;URL=/ref.php?offer_id=350&aff_id=28&url=https%3A%2F%2Fplay.leadzuaf.com%2F%3Fm%3D0ENYJG473721%26offer_key%3D473721%26fc%3D1%26a%3Dy00704Hj50h1zF05XC0HZEp0Kpefss.%7Bpubid%7D%26pubid%3D28&urlauth=ab27ecac97f1760d912ad169b4af1e4b></head></html>"

p a[/URL=([^>]*)/,1]

#=>"/ref.php?offer_id=350&aff_id=28&url=https%3A%2F%2Fplay.leadzuaf.com%2F%3Fm%3D0ENYJG473721%26offer_key%3D473721%26fc%3D1%26a%3Dy00704Hj50h1zF05XC0HZEp0Kpefss.%7Bpubid%7D%26pubid%3D28&urlauth=ab27ecac97f1760d912ad169b4af1e4b"