我最近开始使用Nokogiri作为解析数据到RAILS 3应用程序的解决方案。我遇到的问题是我不完全理解如何做到这一点,因为我正在解析的XML似乎是“非标准的”。看一下下面的代码:
<?xml version="1.0" encoding="utf-8"?>
<dataset xmlns="http://.com/schemas/xmldata/1/" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<!--
<dataset
xmlns="http://.com/schemas/xmldata/1/"
xmlns:xs="http://www.w3.org/2001/XMLSchema-instance"
xs:schemaLocation="http://.com/schemas/xmldata/1/ xmldata.xsd"
>
-->
<metadata>
<item name="Problem ID" type="xs:string" length="32"/>
<item name="Account Title" type="xs:string" length="162"/>
<item name="Account Name" type="xs:string" length="162"/>
<item name="Reassignment" type="xs:int" precision="1"/>
<item name="Initial Severity" type="xs:int" precision="1"/>
<item name="Resolution Desc" type="xs:string" length="510"/>
<item name="Resolver Name" type="xs:string" length="82"/>
<item name="Problem Code" type="xs:string" length="32"/>
<item name="Status" type="xs:string" length="32"/>
</metadata>
<data>
<row>
<value>AP-06684768 </value>
<value>ESA</value>
<value>1</value>
<value>8</value>
<value>8</value>
<value xs:nil="true" />
<value xs:nil="true" />
<value>ADDITION TO EXISTING FIREWALL</value>
<value></value>
<value>ESA BRIDGE </value>
<value>CLOSED </value>
<value>CLOSED </value>
</row>
<row>
<value>AP-06720564 </value>
<value>ESA</value>
<value>2011-01-19T12:02:47</value>
<value>2011-01-19T12:02:49</value>
<value>0</value>
<value>776</value>
<value>SCP UESCADADEV -> UESCADAPW/BW</value>
<value>NETAU_NETMGTS </value>
<value>N/A</value>
<value>ESA BRIDGE </value>
<value>CLOSED </value>
<value>CLOSED </value>
</row>
</data>
</dataset>
而不是命名节点和属性,它似乎是一个'元数据'部分,然后是行,就像一个表真的。我如何解析所有这些数据?
答案 0 :(得分:4)
require 'rubygems'
require 'nokogiri'
require 'pp'
doc = Nokogiri::XML(DATA)
column_names = doc.css('dataset > metadata > item').map {|a| a['name']}
result = doc.css('dataset > data > row').map do |row|
values = row.css('value').map { |value| value[:nil] == 'true' ? nil : value.content }
Hash[column_names.zip(values)]
end
pp result
结果
[{"Problem Code"=>"ADDITION TO EXISTING FIREWALL",
"Resolution Desc"=>nil,
"Reassignment"=>"8",
"Resolver Name"=>nil,
"Status"=>"",
"Problem ID"=>"AP-06684768 ",
"Account Name"=>"1",
"Initial Severity"=>"8",
"Account Title"=>"ESA"},
{"Problem Code"=>"NETAU_NETMGTS ",
"Resolution Desc"=>"776",
"Reassignment"=>"2011-01-19T12:02:49",
"Resolver Name"=>"SCP UESCADADEV -> UESCADAPW/BW",
"Status"=>"N/A",
"Problem ID"=>"AP-06720564 ",
"Account Name"=>"2011-01-19T12:02:47",
"Initial Severity"=>"0",
"Account Title"=>"ESA"}]
答案 1 :(得分:1)
这是我破解并测试的工作代码:
require 'rubygems'
require 'nokogiri'
class Item
attr_accessor :name
def initialize(name)
@name = name
end
end
file = File.open("data.xml")
document = Nokogiri::XML(file)
file.close
metadata = document.root.children[3]
items = metadata.children.reject{|child| child.attribute('name').nil?}.map do |child|
Item.new(child.attribute('name').value)
end
puts "#{items.size} items"
puts items.inspect
结果:
[~/stackoverflow/graphML] ruby parse.rb
9 items
[#<Item:0x007fc01c0fbd90 @id="Problem ID">, #<Item:0x007fc01c0fbca0 @id="Account Title">, #<Item:0x007fc01c0fbc28 @id="Account Name">, #<Item:0x007fc01c0fbbb0 @id="Reassignment">, #<Item:0x007fc01c0fbb38 @id="Initial Severity">, #<Item:0x007fc01c0fbac0 @id="Resolution Desc">, #<Item:0x007fc01c0fba48 @id="Resolver Name">, #<Item:0x007fc01c0fb9d0 @id="Problem Code">, #<Item:0x007fc01c0fb868 @id="Status">]
以下是GitHub上的完整项目:https://github.com/endymion/GraphML-parsing-exercise/tree/metadata-key-names
(这是我今晚早些时候为Stack Overflow上的其他人破解的GraphML解析练习的一个分支。)