如何使用元数据键名解析XML文件?

时间:2012-03-19 02:35:25

标签: ruby-on-rails ruby xml nokogiri

我最近开始使用Nokogiri作为解析数据到RAILS 3应用程序的解决方案。我遇到的问题是我不完全理解如何做到这一点,因为我正在解析的XML似乎是“非标准的”。看一下下面的代码:

<?xml version="1.0" encoding="utf-8"?>
<dataset  xmlns="http://.com/schemas/xmldata/1/"  xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
    xmlns="http://.com/schemas/xmldata/1/"
    xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
    xs:schemaLocation="http://.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
    <metadata>
          <item name="Problem ID" type="xs:string" length="32"/>
          <item name="Account Title" type="xs:string" length="162"/>
          <item name="Account Name" type="xs:string" length="162"/>
          <item name="Reassignment" type="xs:int" precision="1"/>
          <item name="Initial Severity" type="xs:int" precision="1"/>
          <item name="Resolution Desc" type="xs:string" length="510"/>
          <item name="Resolver Name" type="xs:string" length="82"/>
          <item name="Problem Code" type="xs:string" length="32"/>
          <item name="Status" type="xs:string" length="32"/>
    </metadata>
    <data>
        <row>
            <value>AP-06684768    </value>
            <value>ESA</value>
            <value>1</value>
            <value>8</value>
            <value>8</value>
            <value xs:nil="true" />
            <value xs:nil="true" />
            <value>ADDITION TO EXISTING FIREWALL</value>
            <value></value>
            <value>ESA BRIDGE                              </value>
            <value>CLOSED         </value>
            <value>CLOSED         </value>
        </row>
        <row>
            <value>AP-06720564    </value>
            <value>ESA</value>
            <value>2011-01-19T12:02:47</value>
            <value>2011-01-19T12:02:49</value>
            <value>0</value>
            <value>776</value>
            <value>SCP UESCADADEV -&gt; UESCADAPW/BW</value>
            <value>NETAU_NETMGTS  </value>
            <value>N/A</value>
            <value>ESA BRIDGE                              </value>
            <value>CLOSED         </value>
            <value>CLOSED         </value>
        </row>
    </data>
</dataset>

而不是命名节点和属性,它似乎是一个'元数据'部分,然后是行,就像一个表真的。我如何解析所有这些数据?

2 个答案:

答案 0 :(得分:4)

require 'rubygems'
require 'nokogiri'
require 'pp'

doc = Nokogiri::XML(DATA)
column_names = doc.css('dataset > metadata > item').map {|a| a['name']}

result = doc.css('dataset > data > row').map do |row|
  values = row.css('value').map { |value| value[:nil] == 'true' ? nil : value.content }
  Hash[column_names.zip(values)]
end

pp result

结果

[{"Problem Code"=>"ADDITION TO EXISTING FIREWALL",
  "Resolution Desc"=>nil,
  "Reassignment"=>"8",
  "Resolver Name"=>nil,
  "Status"=>"",
  "Problem ID"=>"AP-06684768    ",
  "Account Name"=>"1",
  "Initial Severity"=>"8",
  "Account Title"=>"ESA"},
 {"Problem Code"=>"NETAU_NETMGTS  ",
  "Resolution Desc"=>"776",
  "Reassignment"=>"2011-01-19T12:02:49",
  "Resolver Name"=>"SCP UESCADADEV -> UESCADAPW/BW",
  "Status"=>"N/A",
  "Problem ID"=>"AP-06720564    ",
  "Account Name"=>"2011-01-19T12:02:47",
  "Initial Severity"=>"0",
  "Account Title"=>"ESA"}]

答案 1 :(得分:1)

这是我破解并测试的工作代码:

require 'rubygems'
require 'nokogiri'

class Item
  attr_accessor :name
  def initialize(name)
    @name = name
  end
end

file = File.open("data.xml")
document = Nokogiri::XML(file)
file.close

metadata = document.root.children[3]
items = metadata.children.reject{|child| child.attribute('name').nil?}.map do |child|
  Item.new(child.attribute('name').value)
end

puts "#{items.size} items"
puts items.inspect

结果:

[~/stackoverflow/graphML] ruby parse.rb
9 items
[#<Item:0x007fc01c0fbd90 @id="Problem ID">, #<Item:0x007fc01c0fbca0 @id="Account Title">, #<Item:0x007fc01c0fbc28 @id="Account Name">, #<Item:0x007fc01c0fbbb0 @id="Reassignment">, #<Item:0x007fc01c0fbb38 @id="Initial Severity">, #<Item:0x007fc01c0fbac0 @id="Resolution Desc">, #<Item:0x007fc01c0fba48 @id="Resolver Name">, #<Item:0x007fc01c0fb9d0 @id="Problem Code">, #<Item:0x007fc01c0fb868 @id="Status">]

以下是GitHub上的完整项目:https://github.com/endymion/GraphML-parsing-exercise/tree/metadata-key-names

(这是我今晚早些时候为Stack Overflow上的其他人破解的GraphML解析练习的一个分支。)