将xml转换为本机Ruby数据结构

时间:2013-01-30 17:38:03

标签: ruby xml

我正在从像这样返回xml的api中获取数据:

<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>

我是反序列化的新手,但我认为合适的是将这个xml解析成一个ruby对象,然后我可以引用像objectFoo.seriess.series.frequency那样返回'Quarterly'。

从我在这里和谷歌的搜索中,似乎没有一个明显的解决方案在Ruby(NOT rails),这让我觉得我错过了一些相当明显的东西。有什么想法吗?

修改 我根据Winfield的建议设置了一个测试用例。

class Exopenstruct

  require 'ostruct'

  def initialize()  

  hash = {"seriess"=>{"realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "series"=>{"id"=>"GDPC1", "realtime_start"=>"2013-02-01", "realtime_end"=>"2013-02-01", "title"=>"Real Gross Domestic Product, 1 Decimal", "observation_start"=>"1947-01-01", "observation_end"=>"2012-10-01", "frequency"=>"Quarterly", "frequency_short"=>"Q", "units"=>"Billions of Chained 2005 Dollars", "units_short"=>"Bil. of Chn. 2005 $", "seasonal_adjustment"=>"Seasonally Adjusted Annual Rate", "seasonal_adjustment_short"=>"SAAR", "last_updated"=>"2013-01-30 07:46:54-06", "popularity"=>"93", "notes"=>"Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States.\n\nFor more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"}}}

  object_instance = OpenStruct.new( hash )

  end
end

在irb中我加载了rb文件并实例化了该类。但是,当我尝试访问一个属性(例如instance.seriess)时,我收到了:NoMethodError:undefined method`seriess'

如果我遗漏了一些明显的东西,再次道歉。

4 个答案:

答案 0 :(得分:14)

使用标准XML进行散列解析可能会更好,例如Rails中包含的内容:

object_hash = Hash.from_xml(xml_string)
puts object_hash['seriess']

如果您没有使用Rails堆栈,您可以使用像Nokogiri这样的库来实现相同的行为。

编辑:如果你正在寻找对象行为,使用OpenStruct是一个很好的方法来包装哈希:

object_instance = OpenStruct.new( Hash.from_xml(xml_string) )
puts object_instance.seriess

注意:对于深度嵌套的数据,您可能还需要递归地将嵌入的哈希值转换为OpenStruct实例。 IE:如果上面的属性是值的散列,则它将是散列而不是OpenStruct。

答案 1 :(得分:4)

我刚刚开始使用Damien Le Berrigaud's fork of HappyMapper,我对此非常满意。您定义了简单的Ruby类和include HappyMapper。当你调用parse时,它会使用Nokogiri来填充XML,然后你会得到一个完整的真实Ruby对象树。

我用它来解析多兆字节的XML文件,发现它快速可靠。查看README

一个提示:由于XML文件编码字符串有时存在,您可能需要像这样清理XML:

def sanitize(xml)
  xml.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end

在将其传递给#parse方法之前,以避免Nokogiri的Input is not proper UTF-8, indicate encoding !错误。

更新

我继续将OP的示例转换为HappyMapper:

XML_STRING = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

class Series; end;              # fwd reference

class Seriess
  include HappyMapper
  tag 'seriess'

  attribute :realtime_start, Date
  attribute :realtime_end, Date
  has_many :seriess, Series, :tag => 'series'
end
class Series
  include HappyMapper
  tag 'series'

  attribute 'id', String
  attribute 'realtime_start', Date
  attribute 'realtime_end', Date
  attribute 'title', String
  attribute 'observation_start', Date
  attribute 'observation_end', Date
  attribute 'frequency', String
  attribute 'frequency_short', String
  attribute 'units', String
  attribute 'units_short', String
  attribute 'seasonal_adjustment', String
  attribute 'seasonal_adjustment_short', String
  attribute 'last_updated', DateTime
  attribute 'popularity', Integer
  attribute 'notes', String
end

def test
  Seriess.parse(XML_STRING, :single => true)
end

以下是您可以用它做的事情:

>> a = test
>> a.class
Seriess
>> a.seriess.first.frequency
=> "Quarterly"
>> a.seriess.first.observation_start
=> #<Date: 1947-01-01 ((2432187j,0s,0n),+0s,2299161j)>
>> a.seriess.first.popularity
=> 93

答案 2 :(得分:1)

Nokogiri解决了这个问题。如何处理数据取决于您,在这里我以OpenStruct为例:

require 'nokogiri'
require 'ostruct'
require 'open-uri'

doc = Nokogiri.parse open('http://www.w3schools.com/xml/note.xml')

note = OpenStruct.new

note.to = doc.at('to').text
note.from = doc.at('from').text
note.heading = doc.at('heading').text
note.body = doc.at('body').text

=> #<OpenStruct to="Tove", from="Jani", heading="Reminder", body="ToveJaniReminderDon't forget me this weekend!\r\n">

这只是一个预告片,你的问题幅度可能要大很多倍。只是给你一个优势,开始使用


编辑在谷歌和stackoverflow上遇到困难我的回答和 @Winfield 使用rails Hash#from_xml之间可能出现混合:

> require 'active_support/core_ext/hash/conversions'
> xml = Nokogiri::XML.parse(open('http://www.w3schools.com/xml/note.xml'))
> Hash.from_xml(xml.to_s)
=> {"note"=>{"to"=>"Tove", "from"=>"Jani", "heading"=>"Reminder", "body"=>"Don't forget me this weekend!"}}

然后你可以使用这个哈希来,例如,初始化一个新的ActiveRecord :: Base模型实例或你决定用它做的其他事情。

http://nokogiri.org/
http://ruby-doc.org/stdlib-1.9.3/libdoc/ostruct/rdoc/OpenStruct.html https://stackoverflow.com/a/7488299/1740079

答案 3 :(得分:0)

如果你想将xml转换为Hash,我发现https://cloud.google.com/logging/docs/view/service/appengine-logs#linking_application_logs_and_requests gem是最简单的。

示例:

require 'nori'

xml = '<?xml version="1.0" encoding="utf-8" ?> <seriess realtime_start="2013-01-28" realtime_end="2013-01-28"> <series id="GDPC1" realtime_start="2013-01-28" realtime_end="2013-01-28" title="Real Gross Domestic Product, 1 Decimal" observation_start="1947-01-01" observation_end="2012-07-01" frequency="Quarterly" frequency_short="Q" units="Billions of Chained 2005 Dollars" units_short="Bil. of Chn. 2005 $" seasonal_adjustment="Seasonally Adjusted Annual Rate" seasonal_adjustment_short="SAAR" last_updated="2012-12-20 08:16:28-06" popularity="93" notes="Real gross domestic product is the inflation adjusted value of the goods and services produced by labor and property located in the United States. For more information see the Guide to the National Income and Product Accounts of the United States (NIPA) - (http://www.bea.gov/national/pdf/nipaguid.pdf)"/> </seriess>'

hash = Nori.new.parse(xml)    
hash['seriess']
hash['seriess']['series']
puts hash['seriess']['series']['@frequency']

注意'@'用于频率,因为它是'series'的属性而不是元素。