我用正则表达式检查网址。
将正文中包含的网址处理为html。
不必要的字符进入,所以我不想包含不必要的字符。
我的正则表达式如下。
body
=> "https://www.yahoo.com/<br /><br />sample<br /><br/>https://www.yahoo.com/"
url
=>"https://www.yahoo.com/"
text
=> "<!-- BEGIN app/views/topics/_link_thumbnail_description.html.slim -->\n\n<a class=\"c-grid__quotation--link\" target=\"_blank\" href=\"https://www.yahoo.com/\"><div class=\"c-grid__quotation text--s-md p-topic__quotation__border c-border-r-5\">\n <div class=\"c-flex\">\n <div class=\"c-grid__quotation--main\">\n <img src=\"https://s.yimg.com/dh/ap/default/130909/y_200_a.png\" alt=\"Y 200 a\" />\n </div>\n <div class=\"c-grid__quotation--side\">\n <div class=\"c-grid__quotation--side-title text--b\">\n Yahoo\n </div>\n <div class=\"c-grid__quotation--side-description\">\n News, email and search are just the beginning. Discover more every day. Find your yodel.\n </div>\n <div class=\"c-grid__quotation--side-url\">\n www.yahoo.com\n </div>\n </div>\n </div>\n</div></a><!-- END app/views/topics/_link_thumbnail_description.html.slim -->"
def convert_url_to_text(body, url, text)
reg_url = Regexp.escape("#{url}")
body.gsub!(/(#{reg_url}$|#{reg_url}[\W\/])/){ |s| "#{text}"}
end
它变成了正则表达式。
/(https:\/\/www\.yahoo\.com\/$|https:\/\/www\.yahoo\.com\/[\W\/])/
但是url会得到&lt;在身体
我怎么能不做&lt;不包含?
答案 0 :(得分:2)
不要手动解析。使用URI#extract
:
URI.extract "https://www.yahoo.com/<br />
<br />sample<br /><br/>https://www.yahoo.com/"
#⇒ ["https://www.yahoo.com/", "https://www.yahoo.com/"]