Question

我需要从HTML正文中提取URL，这是我的HTML正文

"<html><head><meta http-equiv=refresh content=0;URL=/ref.php?offer_id=350&aff_id=28&url=https%3A%2F%2Fplay.leadzuaf.com%2F%3Fm%3D0ENYJG473721%26offer_key%3D473721%26fc%3D1%26a%3Dy00704Hj50h1zF05XC0HZEp0Kpefss.%7Bpubid%7D%26pubid%3D28&urlauth=ab27ecac97f1760d912ad169b4af1e4b></head></html>"

如果我使用URI.extract（str）提取此字符串，我将得到一个空数组。请帮我解决这个问题

Answer 1

html = "<html><head><meta http-equiv=refresh content=0;URL=/ref.php?offer_id=..."
html[/(?<=URL=).*?(?=>)/]
#⇒ "/ref.php?offer_id=....."

Answer 2

a="<html><head><meta http-equiv=refresh content=0;URL=/ref.php?offer_id=350&aff_id=28&url=https%3A%2F%2Fplay.leadzuaf.com%2F%3Fm%3D0ENYJG473721%26offer_key%3D473721%26fc%3D1%26a%3Dy00704Hj50h1zF05XC0HZEp0Kpefss.%7Bpubid%7D%26pubid%3D28&urlauth=ab27ecac97f1760d912ad169b4af1e4b></head></html>"

p a[/URL=([^>]*)/,1]

#=>"/ref.php?offer_id=350&aff_id=28&url=https%3A%2F%2Fplay.leadzuaf.com%2F%3Fm%3D0ENYJG473721%26offer_key%3D473721%26fc%3D1%26a%3Dy00704Hj50h1zF05XC0HZEp0Kpefss.%7Bpubid%7D%26pubid%3D28&urlauth=ab27ecac97f1760d912ad169b4af1e4b"

如何从ruby中的html主体中提取url

2 个答案: