解析XML并转换为哈希

时间:2015-03-10 21:38:34

标签: ruby-on-rails xml hash

我有以下XML "device_google_data.xml"

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<report>
    <report-name name="Last 7 days CRITERIA_PERFORMANCE_REPORT" />
    <date-range date="Mar 6, 2015" />
    <table>
        <columns>
            <column name="campaign" display="Campaign" />
            <column name="convertedClicks" display="Converted clicks" />
            <column name="impressions" display="Impressions" />
            <column name="clicks" display="Clicks" />
            <column name="cost" display="Cost" />
            <column name="device" display="Device" />
        </columns>
        <row campaign="The Winds at Mattern Orchard" convertedClicks="0" impressions="46" clicks="1" cost="3120000" device="Mobile devices with full browsers" />
        <row campaign="Aberdeen Crossings" convertedClicks="0" impressions="72" clicks="1" cost="4260000" device="Mobile devices with full browsers" />
        <row campaign="Hawthorne Woods" convertedClicks="0" impressions="147" clicks="0" cost="0" device="Computers" />
        <row campaign="The Winds at Mattern Orchard" convertedClicks="0" impressions="115" clicks="0" cost="0" device="Computers" />
        <row campaign="The Winds at Mattern Orchard" convertedClicks="0" impressions="13" clicks="0" cost="0" device="Tablets with full browsers" />
        <row campaign="Southwest Commons" convertedClicks="0" impressions="9" clicks="0" cost="0" device="Tablets with full browsers" />
        <row campaign="Westlake Woods" convertedClicks="0" impressions="12" clicks="1" cost="4480000" device="Tablets with full browsers" />
        <row campaign="Beachwood Commons" convertedClicks="0" impressions="303" clicks="5" cost="21870000" device="Mobile devices with full browsers" />
        <row campaign="Richland Woods" convertedClicks="0" impressions="24" clicks="0" cost="0" device="Mobile devices with full browsers" />
        <row campaign="Westlake Woods" convertedClicks="0" impressions="24" clicks="1" cost="4040000" device="Mobile devices with full browsers" />
        <row campaign="Salida Woods" convertedClicks="0" impressions="29" clicks="0" cost="0" device="Tablets with full browsers" />
        <row campaign="Southwest Commons" convertedClicks="0" impressions="42" clicks="0" cost="0" device="Mobile devices with full browsers" />
        <row campaign="Southwoods" convertedClicks="0" impressions="38" clicks="0" cost="0" device="Computers" />
        <row campaign="Beachwood Commons" convertedClicks="0" impressions="50" clicks="0" cost="0" device="Tablets with full browsers" />
        <row campaign="Salida Woods" convertedClicks="0" impressions="146" clicks="4" cost="14030000" device="Mobile devices with full browsers" />
        <row campaign="Aberdeen Crossings" convertedClicks="0" impressions="168" clicks="1" cost="2530000" device="Computers" />
        <row campaign="Beachwood Commons - BRAND" convertedClicks="0" impressions="2" clicks="1" cost="800000" device="Mobile devices with full browsers" />
        <row campaign="Beachwood Commons - BRAND" convertedClicks="0" impressions="1" clicks="2" cost="240000" device="Computers" />
        <row campaign="Hawthorne Woods" convertedClicks="0" impressions="21" clicks="0" cost="0" device="Tablets with full browsers" />
        <row campaign="Hawthorne Woods" convertedClicks="0" impressions="103" clicks="4" cost="14870000" device="Mobile devices with full browsers" />
        <row campaign="Aberdeen Crossings" convertedClicks="0" impressions="24" clicks="0" cost="0" device="Tablets with full browsers" />
        <row campaign="Richland Woods" convertedClicks="0" impressions="3" clicks="1" cost="3550000" device="Tablets with full browsers" />
        <row campaign="Salida Woods" convertedClicks="0" impressions="211" clicks="3" cost="8760000" device="Computers" />
        <row campaign="Southwest Commons" convertedClicks="0" impressions="39" clicks="1" cost="7320000" device="Computers" />
        <row campaign="Richland Woods" convertedClicks="0" impressions="23" clicks="2" cost="6990000" device="Computers" />
        <row campaign="Beachwood Commons" convertedClicks="0" impressions="467" clicks="2" cost="7060000" device="Computers" />
        <row campaign="Westlake Woods" convertedClicks="0" impressions="54" clicks="0" cost="0" device="Computers" />
        <row campaign="Southwoods" convertedClicks="0" impressions="9" clicks="0" cost="0" device="Tablets with full browsers" />
        <row campaign="Southwoods" convertedClicks="0" impressions="37" clicks="1" cost="2980000" device="Mobile devices with full browsers" />
    </table>
</report>

我正在尝试解析它并将其转换为哈希值。我已经接近each_with_index

得到了我想要的东西
#Parse device_file
    device_file = File.open("device_google_data.xml")
    doc = Nokogiri::XML(device_file)
    rows = doc.css('row').map{ |row| Hash[ row.attributes.map{|n,a| [n,a.value]} ] }  
    campaigns = rows.map {|m| m["campaign"]}
    device = rows.map {|m| m["device"]}
    impressions = rows.map {|m| m["impressions"]}.map{|e| e.to_i} 
    clicks = rows.map {|m| m["clicks"]}.map {|m| m.to_i}
    conversions = rows.map {|m| m["convertedClicks"]}.map {|m| m.to_i}
    cost = rows.map {|m| m["cost"]}.map {|m| m.to_f}.map{|c| c / 1000000}

构造哈希

device_hash = {}
    campaigns.each_with_index { |e, i|
      device_hash[e.to_sym] = {impressions: impressions[i], clicks: clicks[i], conversions: conversions[i], cost: cost[i], device: device[i]}
    }

但是,它只给了我一个campaign个实例。它类似于在数组上调用uniq。我希望映射所有广告系列行。

device_hash = {:"The Winds at Mattern Orchard"=>{:impressions=>13, :clicks=>0, :conversions=>0, :cost=>0.0, :device=>"Tablets with full browsers"}, :"Aberdeen Crossings"=>{:impressions=>24, :clicks=>0, :conversions=>0, :cost=>0.0, :device=>"Tablets with full browsers"}, :"Hawthorne Woods"=>{:impressions=>103, :clicks=>4, :conversions=>0, :cost=>14.87, :device=>"Mobile devices with full browsers"}, :"Southwest Commons"=>{:impressions=>39, :clicks=>1, :conversions=>0, :cost=>7.32, :device=>"Computers"}, :"Westlake Woods"=>{:impressions=>54, :clicks=>0, :conversions=>0, :cost=>0.0, :device=>"Computers"}, :"Beachwood Commons"=>{:impressions=>467, :clicks=>2, :conversions=>0, :cost=>7.06, :device=>"Computers"}, :"Richland Woods"=>{:impressions=>23, :clicks=>2, :conversions=>0, :cost=>6.99, :device=>"Computers"}, :"Salida Woods"=>{:impressions=>211, :clicks=>3, :conversions=>0, :cost=>8.76, :device=>"Computers"}, :Southwoods=>{:impressions=>37, :clicks=>1, :conversions=>0, :cost=>2.98, :device=>"Mobile devices with full browsers"}, :"Beachwood Commons - BRAND"=>{:impressions=>1, :clicks=>2, :conversions=>0, :cost=>0.24, :device=>"Computers"}}

1 个答案:

答案 0 :(得分:1)

我不确定我理解你的问题。

您已获得此时映射的所有广告系列行:

campaigns = rows.map {|m| m["campaign"]}

广告系列数组有29个元素。

然后你迭代这个列表并将它们放在一个哈希,其中包含广告系列名称:

campaigns.each_with_index { |e, i|
  device_hash[e.to_sym] = {impressions: impressions[i], clicks: clicks[i], conversions: conversions[i], cost: cost[i], device: device[i]}
}

由于存在具有相同名称的广告系列,因此行数会减少,从而评估相同的哈希桶。

如果要在对象中包含所有行,可以使用数组而不是散列。或者,您可以在构建哈希时使用聚合函数,例如总结值。

 campaigns.each_with_index { |e, i|
  prev = device_hash[e.to_sym]
  if (prev.blank?)
    device_hash[e.to_sym] = {:impressions => impressions[i], :clicks => clicks[i], :conversions => conversions[i], :cost => cost[i], :device => device[i]}
  else
    prev[:impressions] += impressions[i]
    prev[:clicks] += clicks[i]
    prev[:conversions] += conversions[i]
    prev[:cost] += cost[i]
    prev[:device] << " | #{device[i]}"
  end
}

我希望它有所帮助。