如何从XML中取消HTML标记?

时间:2012-02-04 07:04:35

标签: html ruby xml api parsing

我正在使用Ruby 1.8.7并将XML内容作为API响应的字符串。我想解析这个响应,以便我可以取消HTML标记:

<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<response>\n  <data>\n    <publisher_share_percent>0.0</publisher_share_percent>\n    <detailed_description>&lt;b&gt;this is the testing detailed&lt;/b&gt; </detailed_description>\n   <title>Only &#163;5.00. food (Regular &#163;50.00 / 90% discount)</title>\n  </data>\n  <request_id>ed96dd50-3127-012f-3e93-042b2b8686e6</request_id>\n  <message>The resource has been created successfully.</message>\n  <status>201</status>\n</response>\n

2 个答案:

答案 0 :(得分:2)

您可以使用CGI::unescapeHTML

require 'cgi'
CGI::unescapeHTML("Usage: foo &quot;bar&quot; &lt;baz&gt;")
# => "Usage: foo \"bar\" <baz>"

答案 1 :(得分:0)

如果您将XML视为XML,并使用XML解析器对其进行解析,则任务变得更加容易:

require 'nokogiri'

xml = <<EOT
<?xml version="1.0" encoding="UTF-8"?>
<response>
  <data>
    <publisher_share_percent>0.0</publisher_share_percent>
    <detailed_description>&lt;b&gt;this is the testing detailed&lt;/b&gt; </detailed_description>
   <title>Only &#163;5.00. food (Regular &#163;50.00 / 90% discount)</title>
  </data>
  <request_id>ed96dd50-3127-012f-3e93-042b2b8686e6</request_id>
  <message>The resource has been created successfully.</message>
  <status>201</status>
  </response>
EOT

doc = Nokogiri::XML(xml)
puts doc.at('detailed_description').text
puts doc.at('title').text

保存并运行文件输出:

ruby ~/Desktop/test2.rb 
<b>this is the testing detailed</b> 
Only £5.00. food (Regular £50.00 / 90% discount)