使用Nokogiri解析XML文件?

时间:2012-07-17 18:10:29

标签: ruby xml xml-parsing nokogiri

<DataSet xmlns="http://www.atcomp.cz/webservices">
  <xs:schema xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" id="file_mame">...</xs:schema>
  <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">
    <alldata xmlns="">
      <category diffgr:id="category1" msdata:rowOrder="0">
        <category_code>P.../category_code>
        <category_name>...</category_name>
        <subcategory diffgr:id="subcategory1" msdata:rowOrder="0">
          <category_code>...</category_code>
          <subcategory_code>...</subcategory_code>
          <subcategory_name>...</subcategory_name>
        </subcategory>
....

如何获取所有categoriessubcategories数据?

我正在尝试类似的事情:

reader.xpath('//DataSet/diffgr:diffgram/alldata').each do |node|

但是这给了我:

undefined method `xpath' for #<Nokogiri::XML::Reader:0x000001021d1750>

1 个答案:

答案 0 :(得分:4)

Nokogiri的Reader解析器不支持XPath。请尝试使用Nokogiri的内存Document解析器。

另一方面,要查询xpath名称空间,您需要提供名称空间映射,如下所示:

doc = Nokogiri::XML(my_document_string_or_io)

namespaces = { 
  'default' => 'http://www.atcomp.cz/webservices', 
  'diffgr' => 'urn:schemas-microsoft-com:xml-diffgram-v1' 
}
doc.xpath('//default:DataSet/diffgr:diffgram/alldata', namespaces).each do |node|
  # ...
end

或者你可以remove the namespaces

doc.remove_namespaces!
doc.xpath('//DataSet/diffgram/alldata').each { |node|  }