Question

我想检测并过滤从表单发送的html文本，包含url或url。

例如，我从表格发送此html：

RESOURCES<br></u></b><a target="_blank" rel="nofollow" href="http://stackoverflow.com/users/778094/hyperrjas">http://stackoverflow.com/users/778094/hyperrjas</a>&nbsp;<br><a target="_blank" rel="nofollow" href="https://github.com/hyperrjas">https://github.com/hyperrjas</a>&nbsp;<br><a target="_blank" rel="nofollow" href="http://www.linkedin.com/pub/juan-ardila-serrano/11/2a7/62">http://www.linkedin.com/pub/juan-ardila-serrano/11/2a7/62</a>&nbsp;<br>

我不希望在html文本中允许或使用各种网址/网址。它可能是这样的：

validate :no_urls

def no_urls
  if text_contains_url
   errors.add(:url, "#{I18n.t("mongoid.errors.models.profile.attributes.url.urls_are_not_allowed_in_this_text", url: url)}")
  end
end

我想知道，如果html文字包含一个或多个网址，我该如何过滤？

Answer 1

您可以使用Ruby的内置URI模块，该模块可能已经从文本中提取URI。

require "uri"

links = URI.extract("your text goes here http://example.com mailto:test@example.com foo bar and more...")
links => ["http://example.com", "mailto:test@example.com"]

因此您可以修改您的验证，如下所示：

validate :no_html

def no_html(text)
  links = URI.extract(text)
  unless links.empty?
    errors.add(:url, "#{I18n.t("mongoid.errors.models.profile.attributes.url.urls_are_not_allowed_in_this_text", url: url)}")
  end
end

Answer 2

您可以使用正则表达式来解析看起来像网址的字符串，例如像这样：/^http:\/\/.*/

但是如果你想检测像a这样的html标签，你应该查看用于解析html的库。

Nokogiri就是这样一个图书馆。

Answer 3

只有当字符串不包含冒号“：”时，Mattherick的答案才有效。

使用 Ruby 1.9.3 ，正确的做法是添加第二个参数来解决此问题。

此外，如果您将电子邮件地址添加为纯文本，则此代码不会过滤此电子邮件地址。解决这个问题的方法是：

html_text  = "html text with email address e.g. info@test.com"
email_address = html_text.match(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/i)[0]

所以，这是我能正常使用的代码：

def no_urls
  whitelist = %w(attr1, attr2, attr3, attr4)
  attributes.select{|el| whitelist.include?(el)}.each do |key, value|
    links = URI.extract(value, /http(s)?|mailto/)
    email_address = "#{value.match(/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/i)}"
    unless links.empty? and email_address.empty?
      logger.info links.first.inspect
      errors.add(key, "#{I18n.t("mongoid.errors.models.cv.attributes.no_urls")}")
    end
  end
end

问候！

在html文本中验证url的存在

3 个答案: