Question

我使用Ruby 1.9.3p385，Nokogiri和xpath v.1。

在Stackoverflow上很棒的人的帮助下，我想出了这个xpath表达式：

products = xml_file.xpath("(/root_tag/middle_tag/item_tag")

拆分此XML文件：

<root_tag>
  <middle_tag>
    <item_tag>
      <headline_1>
        <tag_1>Product title 1</tag_1>
      </headline_1>
      <headline_2>
        <tag_2>Product attribute 1</tag_2>
      </headline_2>
    </item_tag>
    <item_tag>
      <headline_1>
        <tag_1>Product title 2</tag_1>
      </headline_1>
      <headline_2>
        <tag_2>Product attribute 2</tag_2>
      </headline_2>
    </item_tag>
  </middle_tag>
</root_tag>

分为2个产品。

我现在希望浏览每个产品并提取所有产品信息（通过提取其叶节点）。为此，我使用此代码：

products.each do |product|
  puts product #=> <item_tag><headline_1><tag_1>Product title 1</tag_1></headline_1><headline_2><tag_2>Product attribute 1</tag_2></headline_2></item_tag>
  product_data = product.xpath("//*[not(*)]")
  puts product_data #=> <tag_1>Product title 1</tag_1><tag_2>Product attribute 1</tag_2><tag_1>Product title 2</tag_1><tag_2>Product attribute 2</tag_2>
end

正如你所看到的，这完全符合我的要求，只为一件事：它通过产品而不是产品来读取。

如何仅限搜索产品？在回答时，请注意该示例已简化。我希望解决方案“擦除”产品的知识（如果可能的话），因为那时它可能适用于所有情况。

Answer 1

而不是：

//*[not(*)]

使用：

(//product)[1]//*[not(*)]

这仅选择XML文档中第一个product元素下的“叶节点”。

对文档中的所有product元素重复此操作。你可以通过以下方式得到他们的数量：

count(//product)

Answer 2

您可能只想：

product_data = product.xpath("*")

将全部找到产品的子元素。

Answer 3

答案是在.之前添加//*[not(*)]：

product_data = product.xpath(".//*[not(*)]")

这告诉XPath表达式从当前节点而不是根节点开始。

先生。 Novatchev的答案虽然在技术上是正确的，但不会导致解析代码成为惯用的Ruby。

如何使xpath表达式只读取文档的一部分（Ruby / Nokogiri / xpath）

3 个答案: