Question

使用Rails 3.2。我想删除<b>中的所有文字和标签，但我设法找到仅剥离标签的方法。：

string = "
  <p>
    <b>Section 1</b>
    Everything is good.<br>
    <b>Section 2</b>
    All is well.
  </p>"
string.strip_tags
# => "Section 1 Everthing is good. Section 2 All is well."

我想实现这个目标：

"Everthing is good. All is well."

我是否应该添加正则表达式匹配？

Answer 1

“正确”的方式是使用像Nokogiri这样的html解析器但是，对于这个简单的任务，您可以使用正则表达式。这很简单：
搜索：(?m)<b\s*>.*?<\/b\s*>并将其替换为空字符串。之后，使用strip_tags。

正则表达式解释：

(?m)    # set the m modifier to match newlines with dots .
<b      # match <b
\s*     # match a whitespace zero or more times
>       # match >
.*?     # match anything ungreedy until </b found
<\/b    # match </b
\s*     # match a whitespace zero or more times
>       # match >

Online demo

Answer 2

使用HTML / XML解析器完成此任务会好得多。 Ruby没有原生的，但是Nokogiri很好并且包装了libxml / xslt

doc = Nokogiri::XML string
doc.xpath("//b").remove
result = doc.text # or .inner_html to include `<p>`

Answer 3

您可以执行string.gsub(/<b>.*<\/b>/, '')

http://rubular.com/r/hhmpY6Q6fX

Answer 4

如果你想删除标签，你可以试试这个：

ActionController::Base.helpers.sanitize("test<br>test<br>test<br> test")

如果你想删除你需要使用的所有标签：

ActionView::Base.full_sanitizer.sanitize("test<br>test<br>test<br> test")

这两者略有不同。第一个有利于脚本标签以防止Xss攻击，但它不会删除tages。第二个删除文本中的任何html标签。

删除特定标记内的内容

4 个答案: