Question

我有一个xml文档，如下所示：

<doc>
  <header>
    <group>
      <note>group note</note>
    </group>
    <note>header note</note>
  </header>
</doc>

我想检索属于标题的注释元素，而不是任何属于组的注释元素。

我认为这应该有效，但它也会在小组中选择注释：

 doc.css('header note')

只抓取作为标题直接子元素的note元素的语法是什么？

Answer 1

您可以使用CSS选择器中的>来查找子元素。这与使用查找后代元素的空格形成对比。

在你的情况下：

puts doc.css('header > note')
#=> "<note>header note</note>"

Answer 2

最简单的事情是让Nokogiri找到所有header note标签，然后只使用最后一个：

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<doc>
  <header>
    <group>
      <note>group note</note>
    <group>
    <note>header note</note>
  </header>
</doc>
EOT

doc.css('header note').last.text # => "header note"

请记住，css与XPath对等xpath一样，而且更通用search，返回NodeSet。 NodeSet类似于一个数组，您可以对其进行切片或使用first或last。

但请注意，您可以轻松使用：

doc.css('note').last.text # => "header note"

请注意，您的XML格式错误。 <group>标记未关闭。 Nokogiri正在对XML进行修正，这可能会给你带来奇怪的结果。通过查看doc.errors：

来检查这种情况

# => [#<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: group line 5 and header>,
#     #<Nokogiri::XML::SyntaxError: Opening and ending tag mismatch: group line 3 and doc>,
#     #<Nokogiri::XML::SyntaxError: Premature end of data in tag header line 2>,
#     #<Nokogiri::XML::SyntaxError: Premature end of data in tag doc line 1>]

Nokogiri：忽略子节点

2 个答案: