解析转换后的word文档 - Nokogiri :: XML :: XPath :: SyntaxError

时间:2013-12-23 14:46:04

标签: ruby-on-rails ruby xml xslt nokogiri

我有以下xml文件,我有解析这个问题,我只是想分别解析每个标签。

<pkg:xmlData>
....
</pkg:xmlData>
</pkg:part>
<pkg:part pkg:name="/word/document.xml" pkg:contentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml">
<pkg:xmlData>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
<w:body>
<w:p w:rsidR="00D506C1" w:rsidRDefault="00D506C1">
<w:bookmarkStart w:id="0" w:name="_GoBack"/>
<w:bookmarkEnd w:id="0"/>
<w:r>
<w:t>Max Mara</w:t>
</w:r>
<w:r w:rsidR="00625187">
<w:t>s</w:t>
</w:r>
<w:r>
<w:t xml:space="preserve">Frühjahr/Sommer</w:t>
</w:r>
....
</w:p>
...
</w:body>
...
</pkg:part>

这就是我的尝试:

doc = Nokogiri::XML(File.open(@file),nil,"UTF-8")
 root = doc.root
 title = doc.xpath("//pkg:xmlData//w:body")

这就是我得到的:

Nokogiri::XML::XPath::SyntaxError: Undefined namespace prefix: //pkg:xmlData//w:body

任何帮助?

1 个答案:

答案 0 :(得分:2)

当您处理命名空间XML文档时,您还需要为xpath调用提供命名空间参数,如下所示:

title = doc.xpath("//pkg:xmlData//w:body", 
                  "pkg" => "http://example.com/package", 
                  "w" => "http://example.com/w")

在上面的代码中,将http://example.com/package替换为文件@file中为此命名空间定义的URL。同样地对http://example.com/w执行相同操作。