如何使用Nokogiri解析XML文档

时间:2017-06-30 13:25:58

标签: ruby-on-rails ruby xml parsing nokogiri

我试图在Ruby on Rails中使用Nokogiri解析XML文件:

<ItemID>
    <SupplierPartID>GH-CF-BINDER</SupplierPartID>
  </ItemID>
  <ItemDetail>
    <UnitPrice><Money currency="USD"></Money></UnitPrice>
    <Description xml:lang="en">Ghent Contract Furniture Binder</Description>
    <UnitOfMeasure>Each</UnitOfMeasure>
    <Classification domain=""></Classification>
    <Extrinsic name="tag">GH-CF-BINDER</Extrinsic>
    <Extrinsic name="bin_number">103/18/9</Extrinsic>
    <Extrinsic name="billing email"></Extrinsic> 
    <Extrinsic name="bill_code6"></Extrinsic> 
    <Extrinsic name="prodcode"></Extrinsic>
    <Extrinsic name="stock_tag"></Extrinsic> 
    <Extrinsic name="has_imprint">N</Extrinsic> 
    <Extrinsic name="manuf_id">4117</Extrinsic>
    <Extrinsic name="bill_code1"></Extrinsic>

  </ItemDetail>

当我尝试解析时,而不是为某些空白字段接收空白值,而是返回&#34; name =&gt; bill_code1&#34; (例如)。

到目前为止,我的解决方案是这样的,但每个XML文件在外部标签的布局上略有不同:

item_info_params['extrinsic_tag'] = item_info_to_xml['ItemOut']['ItemDetail']['Extrinsic']

#expands from the above param extrinsic_tag because there is 10 elemenets in it
item_info_params['tag'] = item_info_params['extrinsic_tag'][0]
item_info_params['price'] = item_info_params['extrinsic_tag'][1]
item_info_params['bin_number'] = item_info_params['extrinsic_tag'][2]
item_info_params['requested_delivery'] = item_info_params['extrinsic_tag'][3]
item_info_params['billing_email'] = item_info_params['extrinsic_tag'][4]
item_info_params['bill_code6'] = item_info_params['extrinsic_tag'][5]
item_info_params['prodcode'] = item_info_params['extrinsic_tag'][6]
item_info_params['stock_tag'] = item_info_params['extrinsic_tag'][7]
item_info_params['has_imprint'] = item_info_params['extrinsic_tag'][8]
item_info_params['manuf_id'] = item_info_params['extrinsic_tag'][9]
item_info_params['bill_code1'] = item_info_params['extrinsic_tag'][10]
item_info_params['order_id'] = order.id

customer = Customer.create(item_info_params)

因此,如上所述,item_info_params['bill_code1']将被分配到name => bill_code1而不是空值。

如何解决这个问题令人困惑。

2 个答案:

答案 0 :(得分:0)

require 'nokogiri'
xml = '<ItemID><SupplierPartID>GH-CF-BINDER</SupplierPartID><ItemDetail><UnitPrice><Money currency="USD"></Money></UnitPrice><Description xml:lang="en">Ghent Contract Furniture Binder</Description><UnitOfMeasure>Each</UnitOfMeasure><Classification domain=""></Classification><Extrinsic name="tag">GH-CF-BINDER</Extrinsic><Extrinsic name="bin_number">103/18/9</Extrinsic><Extrinsic name="billing email"></Extrinsic><Extrinsic name="bill_code6"></Extrinsic><Extrinsic name="prodcode"></Extrinsic><Extrinsic name="stock_tag"></Extrinsic><Extrinsic name="has_imprint">N</Extrinsic><Extrinsic name="manuf_id">4117</Extrinsic><Extrinsic name="bill_code1"></Extrinsic></ItemDetail></ItemID>'


doc = Nokogiri::XML(xml)
bill_code = doc.xpath('//*[@name="bill_code1"]')[0].content
puts "bill_code: #{bill_code}"
#=> bill_code: 





tag = doc.xpath('//*[@name="tag"]')[0].content
puts "tag: #{tag}"
#=> tag: GH-CF-BINDER

答案 1 :(得分:0)

默想:

require 'nokogiri'

EXTRINSIC_NAMES = ['tag', 'bin_number', 'manuf_id']

doc = Nokogiri::XML(<<EOT)
      <ItemDetail>
        <UnitPrice><Money currency="USD"></Money></UnitPrice>
        <Description xml:lang="en">Ghent Contract Furniture Binder</Description>
        <UnitOfMeasure>Each</UnitOfMeasure>
        <Classification domain=""></Classification>
        <Extrinsic name="tag">GH-CF-BINDER</Extrinsic>
        <Extrinsic name="bin_number">103/18/9</Extrinsic>
        <Extrinsic name="billing email"></Extrinsic> 
        <Extrinsic name="bill_code6"></Extrinsic> 
        <Extrinsic name="prodcode"></Extrinsic>
        <Extrinsic name="stock_tag"></Extrinsic> 
        <Extrinsic name="has_imprint">N</Extrinsic> 
        <Extrinsic name="manuf_id">4117</Extrinsic>
        <Extrinsic name="bill_code1"></Extrinsic>

      </ItemDetail>
EOT

item_detail = doc.at('ItemDetail')

extrinsic_values = EXTRINSIC_NAMES.map { |name| 
  [name, doc.at("Extrinsic[name=#{name}]").text]
}.to_h

extrinsic_values 
# => {"tag"=>"GH-CF-BINDER", "bin_number"=>"103/18/9", "manuf_id"=>"4117"}

修改EXTRINSIC_NAMES以选择所需的值。

如果你想要它们,那么它可以更简单:

extrinsic_values = doc.css('Extrinsic').map { |node| 
  [node['name'], node.text]
}.to_h

extrinsic_values 
# => {"tag"=>"GH-CF-BINDER",
#     "bin_number"=>"103/18/9",
#     "billing email"=>"",
#     "bill_code6"=>"",
#     "prodcode"=>"",
#     "stock_tag"=>"",
#     "has_imprint"=>"N",
#     "manuf_id"=>"4117",
#     "bill_code1"=>""}

无论哪种方式,如果您的XML文件格式不同,那么您必须在代码中允许这样做。