如何使用nokogiri迭代XML文件层次结构?

时间:2014-05-11 21:53:10

标签: ruby parsing nokogiri

这是一个示例文件:

<?xml version="1.0" encoding="UTF-8"?>
<response status="success">
   <campaigns>
      <campaign>
         <campaign_id>41381</campaign_id>
         <campaign_name><![CDATA[campaign1]]></campaign_name>
         <campaign_status>1</campaign_status>
         <campaign_type>STANDARD</campaign_type>
         <campaign_notes />
         <campaign_rate />
         <campaign_owner_id>33975</campaign_owner_id>
         <campaign_start_date>11-05-2014</campaign_start_date>
         <campaign_end_date>12-12-2020</campaign_end_date>
         <creation_date>11-05-2014</creation_date>
         <daily_budget>10.000</daily_budget>
         <daily_budget_left>10.000000000000000000000000000000</daily_budget_left>
         <total_budget>X</total_budget>
         <total_budget_left>1000000.000000000000000000000000000000</total_budget_left>
         <reporting>
            <impressions />
            <clicks />
            <total_cost>
               <currency>USD</currency>
               <amount />
            </total_cost>
            <average_cpc>
               <currency>USD</currency>
               <amount>0</amount>
            </average_cpc>
            <conversions />
            <cost_per_conversion>
               <currency>USD</currency>
               <amount>n/a</amount>
            </cost_per_conversion>
         </reporting>
      </campaign>
   </campaigns>
</response>

我想要做的是浏览每个广告系列并解析数据以生成内存对象。例如,我想根据每个campaign创建ruby对象。我希望能够像campaigns.each {|campaign| puts impressions = campaign['reporting']['impressions']}

这样的东西

1 个答案:

答案 0 :(得分:1)

以下是您帖子中描述的要求的一些代码。它仅适用于Hash类似XML结构,例如示例数据中的campaign节点。如果你想要Array之类的行为,你可能需要明确地处理它们,就像我为campaigns节点所做的那样。

require 'nokogiri'

def parse(element)
  children = element.children.reject{|e| e.is_a?(Nokogiri::XML::Text) && e.text =~ /^\s*$/}

  if children.count == 1 && children[0].is_a?(Nokogiri::XML::Text)
    children[0].text
  else
    data = Hash.new
    children.each do |child|
      data[child.name] = parse(child)
    end
    data
  end
end

doc = Nokogiri::XML(open('data.xml')) # suppose the xml is stored in data.xml

campaigns = doc.xpath('/response/campaigns/campaign').map{|c| parse(c)}
p campaigns