Question

我正在尝试使用ruby解析HTML字符串，该字符串包含多个<pre></pre>标签，我需要查找和编码每个标签之间的所有<和>括号元素。

Example: 

string_1_pre = "<pre><h1>Welcome</h1></pre>"

string_2_pre = "<pre><h1>Welcome</h1></pre><pre><h1>Goodbye</h1></pre>"

def clean_pre_code(html_string)
 matched = html_string.match(/(?<=<pre>).*(?=<\/pre>)/)
 cleaned = matched.to_s.gsub(/[<]/, "&lt;").gsub(/[>]/, "&gt;")
 html_string.gsub(/(?<=<pre>).*(?=<\/pre>)/, cleaned)
end

clean_pre_code(string_1_pre) #=> "<pre>&lt;h1&gt;Welcome&lt;/h1&gt;</pre>"
clean_pre_code(string_2_pre) #=> "<pre>&lt;h1&gt;Welcome&lt;/h1&gt;&lt;/pre&gt;&lt;pre&gt;&lt;h1&gt;Goodbye&lt;/h1&gt;</pre>"

只要html_string仅包含一个<pre></pre>元素，则此方法有效，但是如果存在多个元素，则无效。

我愿意接受使用Nokogiri或类似产品的解决方案，但无法弄清楚如何使其按我的意愿做。

请让我知道是否需要其他上下文。

更新：仅适用于Nokogiri，请参见已接受的答案。

Answer 1

@ zstrad44是的，您可以使用Nokogiri完成它。这是我从您的版本开发的我的代码版本，这将为您提供字符串中多个pre标记所需的结果。

def clean_pre_code(html_string)
  doc = Nokogiri::HTML(html_string)
  all_pre = doc.xpath('//pre')
  res = ""
  all_pre.each do |pre|
    pre = pre.to_html
    matched = pre.match(/(?<=<pre>).*(?=<\/pre>)/)
    cleaned = matched.to_s.gsub(/[<]/, "&lt;").gsub(/[>]/, "&gt;")
    res += pre.gsub(/(?<=<pre>).*(?=<\/pre>)/, cleaned)
  end
  res
end

我建议您阅读Nokogiri Cheatsheet，以更好地理解代码中使用的方法。编码愉快！希望我能帮助

如何在一个字符串中查找多个子字符串匹配项，更改子字符串附件

1 个答案: