如何用特殊字符替换字符串中的多个字符

时间:2019-07-26 20:39:28

标签: regex ruby

我有一个与“ Ruby gsub multiple characters in string”几乎相同的问题。

但是,我的字符串包含特殊字符:

a = "<p>text</p> <strong>bold</strong> and <em>italic</em>"

使用/\w+/对我不起作用。 我尝试了许多不同的组合,但没有运气。 我应该在下面输入什么正则表达式匹配项才能使其正常工作?我想替换字符串中任何位置的那些匹配项。

通过我使用Rails的方式。

我想要的比赛是:

a.gsub({{WHAT REGEX EXP?}},
  "\r\n" => "",
  "<p>" => "",
  "</p>" => "\n\n",
  "<br />" => "\n",
  "<strong>" => "*",
  "</strong>" => "*",
  "<em>" => "_",
  "</em>" => "_",
  "<s>" => "~",
  "</s>" => "~",
  "<blockquote>" => ">",
  "</blockquote>" => ">",
  "&" => "&amp;",
  "<" => "&lt;",
  ">" => "&gt;"
)

3 个答案:

答案 0 :(得分:2)

#gsub的工作原理:

replacements = {
  "\r\n" => "",
  "<p>" => "",
  "</p>" => "\n\n",
  "<br />" => "\n",
  "<strong>" => "*",
  "</strong>" => "*",
  "<em>" => "_",
  "</em>" => "_",
  "<s>" => "~",
  "</s>" => "~",
  "<blockquote>" => ">",
  "</blockquote>" => ">",
  "&" => "&amp;",
  "<" => "&lt;",
  ">" => "&gt;"
}

a = "<p>text</p> <strong>bold</strong> and <em>italic</em>"

replacements.each do |find, replace|
  a.gsub!(find, replace)
end

a # => "text\n\n *bold* and _italic_"

答案 1 :(得分:1)

您可以通过一次调用来完成此操作,正则表达式为/<[^>]+>|[<>&]/

a = "<p>text</p> <strong>bold</strong> and <em>italic</em> & <>"
a.gsub(/(<[^>]+>|[<>&])/, replacements)
# => "text\n\n *bold* and _italic_ &amp; &lt;&gt;"

Demo

  

String#gsub(pattern, hash) → new_str   如果第二个参数是哈希,并且匹配的文本是其键之一,则对应的值是替换字符串。 Docs

正则表达式说明:

  • <[^>]+>匹配HTML标记-您首先匹配<,然后匹配一个或多个不是>的字符,并依次[^>]+>
  • [<>&]匹配特殊字符的一次出现,例如<>&

也就是说,正则表达式不是处理HTML的最佳工具,最好使用HTML解析器(例如Nokogiri)。

答案 2 :(得分:1)

可以一口气完成:

replacements = {
  "\r\n" => "",
  "<p>" => "",
  "</p>" => "\n\n",
  "<br />" => "\n",
  "<strong>" => "*",
  "</strong>" => "*",
  "<em>" => "_",
  "</em>" => "_",
  "<s>" => "~",
  "</s>" => "~",
  "<blockquote>" => ">",
  "</blockquote>" => ">",
  "&" => "&amp;",
  "<" => "&lt;",
  ">" => "&gt;"
}

keys = Regexp.union(replacements.keys)
a    = "<p>text</p> <strong>bold</strong> and <em>italic</em>"

p a.gsub(keys, replacements) # => "text\n\n *bold* and _italic_"

这很容易工作,因为Regexp.union为您完成了所有艰苦的工作(转义了奇怪的字符)。