无法使用Nokogiri找到任何节点

时间:2017-07-19 09:08:54

标签: ruby xpath nokogiri

我有[Content_Types].xml个文件:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
  <Default Extension="xml" ContentType="application/xml"/>
  <Default Extension="jpeg" ContentType="image/jpeg"/>
  <Default Extension="png" ContentType="image/png"/>
  <Default Extension="jpg" ContentType="image/jpeg"/>
  <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
  <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
  <Override PartName="/customXml/itemProps1.xml" ContentType="application/vnd.openxmlformats-officedocument.customXmlProperties+xml"/>
  <Override PartName="/word/numbering.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.numbering+xml"/>
  <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
  <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
  <Override PartName="/word/webSettings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.webSettings+xml"/>
  <Override PartName="/word/footnotes.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footnotes+xml"/>
  <Override PartName="/word/endnotes.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.endnotes+xml"/>
  <Override PartName="/word/header1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml"/>
  <Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/>
  <Override PartName="/word/header2.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml"/>
  <Override PartName="/word/footer2.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/>
  <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
  <Override PartName="/word/theme/theme1.xml" ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
  <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
  <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
</Types>

我用Nokogiri加载了它:

doc = File.open("[Content_Types].xml") { |f| Nokogiri::XML(f) }

我想找到<Default Extension="png" ContentType="image/png"/>节点,但我找不到任何内容:

irb(main):048:0> doc.xpath('//Default')
=> []
irb(main):049:0> doc.xpath('//Override')
=> []
irb(main):050:0> doc.xpath('//Types')
=> []
irb(main):051:0> doc.xpath('Types')
=> []

为什么?

正确加载xml:

irb(main):003:0> doc
=>
#<Nokogiri::XML::Document:0x3fcddd413ad0 name="document" children=[#<Nokogiri::XML::Element:0x3fcddd41347c name="Types" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> children=[#<Nokogiri::XML::Text:0x3fcddd412f18 "\n\n  ">, #<Nokogiri::XML::Element:0x3fcddd412e14 name="Default" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd412d9c name="Extension" value="xml">, #<Nokogiri::XML::Attr:0x3fcddd412d88 name="ContentType" value="application/xml">]>, #<Nokogiri::XML::Text:0x3fcddd4126bc "\n  ">, #<Nokogiri::XML::Element:0x3fcddd413558 name="Default" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd40ffac name="Extension" value="rels">, #<Nokogiri::XML::Attr:0x3fcddd40ff84 name="ContentType" value="application/vnd.openxmlformats-package.relationships+xml">]>, #<Nokogiri::XML::Text:0x3fcddd40ef08 "\n  ">, #<Nokogiri::XML::Element:0x3fcddd40eddc name="Default" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd40ecec name="Extension" value="jpeg">, #<Nokogiri::XML::Attr:0x3fcddd40ecc4 name="ContentType" value="image/jpeg">]>, #<Nokogiri::XML::Text:0x3fcddd40bca4 "\n  ">, #<Nokogiri::XML::Element:0x3fcddd40bb78 name="Override" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd40bac4 name="PartName" value="/word/document.xml">, #<Nokogiri::XML::Attr:0x3fcddd40ba88 name="ContentType" value="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml">]>, #<Nokogiri::XML::Text:0x3fcddd40a8cc "\n  ">, #<Nokogiri::XML::Element:0x3fcddd40a7f0 name="Override" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd40a778 name="PartName" value="/word/styles.xml">, #<Nokogiri::XML::Attr:0x3fcddd40a764 name="ContentType" value="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml">]>, #<Nokogiri::XML::Text:0x3fcddd099d00 "\n  ">, #<Nokogiri::XML::Element:0x3fcddd099bc0 name="Override" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd099b34 name="PartName" value="/word/settings.xml">, #<Nokogiri::XML::Attr:0x3fcddd099b20 name="ContentType" value="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml">]>, #<Nokogiri::XML::Text:0x3fcddd098e8c "\n  ">, #<Nokogiri::XML::Element:0x3fcddd098d60 name="Override" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd098cc0 name="PartName" value="/word/webSettings.xml">, #<Nokogiri::XML::Attr:0x3fcddd098cac name="ContentType" value="application/vnd.openxmlformats-officedocument.wordprocessingml.webSettings+xml">]>, #<Nokogiri::XML::Text:0x3fcddd08ded8 "\n  ">, #<Nokogiri::XML::Element:0x3fcddd08dd98 name="Override" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd08dce4 name="PartName" value="/word/fontTable.xml">, #<Nokogiri::XML::Attr:0x3fcddd08dca8 name="ContentType" value="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml">]>, #<Nokogiri::XML::Text:0x3fcddd08cdf8 "\n  ">, #<Nokogiri::XML::Element:0x3fcddd08cd1c name="Override" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd08cc54 name="PartName" value="/word/theme/theme1.xml">, #<Nokogiri::XML::Attr:0x3fcddd08cc40 name="ContentType" value="application/vnd.openxmlformats-officedocument.theme+xml">]>, #<Nokogiri::XML::Text:0x3fcddd08c0d8 "\n  ">, #<Nokogiri::XML::Element:0x3fcddd08c010 name="Override" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd089fa4 name="PartName" value="/docProps/core.xml">, #<Nokogiri::XML::Attr:0x3fcddd089f90 name="ContentType" value="application/vnd.openxmlformats-package.core-properties+xml">]>, #<Nokogiri::XML::Text:0x3fcddd089388 "\n  ">, #<Nokogiri::XML::Element:0x3fcddd089248 name="Override" namespace=#<Nokogiri::XML::Namespace:0x3fcddd4133b4 href="http://schemas.openxmlformats.org/package/2006/content-types"> attributes=[#<Nokogiri::XML::Attr:0x3fcddd089144 name="PartName" value="/docProps/app.xml">, #<Nokogiri::XML::Attr:0x3fcddd089130 name="ContentType" value="application/vnd.openxmlformats-officedocument.extended-properties+xml">]>]>]>

1 个答案:

答案 0 :(得分:2)

在Nokogiri网站的"Searching a XML/HTML document"页面上,有一个ATOM示例

  

我们以原子进给为例:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Example Feed</title>
  <link href="http://example.org/"/>
  <updated>2003-12-13T18:30:02Z</updated>
  <author>
    <name>John Doe</name>
  </author>
  <id>urn:uuid:60a76c80-d399-11d9-b93C-0003939e0af6</id>
  <entry>
    <title>Atom-Powered Robots Run Amok</title>
    <link href="http://example.org/2003/12/13/atom03"/>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <summary>Some text.</summary>
  </entry>
</feed>
     

如果我们坚持惯例,我们可以抓住所有标题标签

@doc.xpath('//xmlns:title') # => ["<title>Example Feed</title>", "<title>Atom-Powered Robots Run Amok</title>"]

由于您的示例输入有

<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">

它应该可以做到

puts doc.xpath("//xmlns:Default[@Extension='png']")
# <Default Extension="png" ContentType="image/png"/>

或者,您可以使用css代替

puts doc.css("Types Default[Extension='png']")
# <Default Extension="png" ContentType="image/png"/>

如果您有兴趣

,页面上还会有关于not dealing with namespaces的部分