Question

我用正则表达式检查网址。

将正文中包含的网址处理为html。

不必要的字符进入，所以我不想包含不必要的字符。

我的正则表达式如下。

body
=> "https://www.yahoo.com/<br /><br />sample<br /><br/>https://www.yahoo.com/"
url
=>"https://www.yahoo.com/"
text
=> "<!-- BEGIN app/views/topics/_link_thumbnail_description.html.slim -->\n\n<a class=\"c-grid__quotation--link\" target=\"_blank\" href=\"https://www.yahoo.com/\"><div class=\"c-grid__quotation text--s-md p-topic__quotation__border c-border-r-5\">\n  <div class=\"c-flex\">\n    <div class=\"c-grid__quotation--main\">\n      <img src=\"https://s.yimg.com/dh/ap/default/130909/y_200_a.png\" alt=\"Y 200 a\" />\n    </div>\n    <div class=\"c-grid__quotation--side\">\n      <div class=\"c-grid__quotation--side-title text--b\">\n        Yahoo\n      </div>\n      <div class=\"c-grid__quotation--side-description\">\n        News, email and search are just the beginning. Discover more every day. Find your yodel.\n      </div>\n      <div class=\"c-grid__quotation--side-url\">\n        www.yahoo.com\n      </div>\n    </div>\n  </div>\n</div></a><!-- END app/views/topics/_link_thumbnail_description.html.slim -->"


  def convert_url_to_text(body, url, text)
    reg_url = Regexp.escape("#{url}")
    body.gsub!(/(#{reg_url}$|#{reg_url}[\W\/])/){ |s| "#{text}"}
  end

它变成了正则表达式。

/(https:\/\/www\.yahoo\.com\/$|https:\/\/www\.yahoo\.com\/[\W\/])/

但是url会得到＆lt;在身体

http://rubular.com/

我怎么能不做＆lt;不包含？

Answer 1

不要手动解析。使用URI#extract：

URI.extract "https://www.yahoo.com/<br />
   <br />sample<br /><br/>https://www.yahoo.com/"
#⇒ ["https://www.yahoo.com/", "https://www.yahoo.com/"]

正则表达式url转换中包含不必要的字符

1 个答案: