Question

我有一个字符串：

<products type="array">
<product><brand>Rho2</brand>
<created-at type="datetime">2011-11-03T21:29:46Z</created-at><id type="integer">78013</id><name>Test2</name>
<price nil="true"/>
<quantity nil="true"/>
<sku nil="true"/>
<updated-at type="datetime">2011-11-03T21:29:46Z</updated-at>
</product>
<product>
<brand>Apple</brand>
<created-at type="datetime">2011-10-26T21:26:59Z</created-at>
<id type="integer">77678</id>
<name>iPhone</name>
<price>$199.99</price>
<quantity>5</quantity>
<sku>1234</sku>
<updated-at type="datetime">2011-10-26T21:27:00Z</updated-at>
</product>

我想在<brand>和</brand>之间获取文字。

我正在尝试解析此XML，在标记之间收集数据。

Answer 1

XmlSimple应该很容易。

 require 'xmlsimple'
 products = XmlSimple.xml_in('<YOUR WHOLE XML>', { 'KeyAttr' => 'product' })

Answer 2

您应该使用平台中可用的任何XML解析器。然后你可以使用简单的XPath表达式：

//brand

选择文档中的所有brand元素。

Answer 3

目前，在Ruby中解析XML和HTML的事实标准是Nokogiri：

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<products type="array">
  <product>
    <brand>Rho2</brand>
    <created-at type="datetime">2011-11-03T21:29:46Z</created-at>
  </product>
  <product>
    <brand>Apple</brand>
    <created-at type="datetime">2011-10-26T21:26:59Z</created-at>
  </product>  
</products>
EOT

puts doc.search('brand').map(&:text)

哪个输出：

Rho2
Apple

在两个字符串之间获取字符

3 个答案: