如何有条件地检查和提取XML元素

时间:2014-03-23 19:34:55

标签: ruby xml regex

我必须解析一个看起来像这样的给定XML文件:

<country id='cid-cia-Ashmore-and-Cartier-Islands' 
  continent='Asia'
  name='Ashmore and Cartier Islands'
  datacode='AT'
  total_area='5'
  government='territory of Australia administered by the Australian Ministry for the Environment'>
  <coasts>Indian Ocean</coasts>
</country>

<country id='cid-cia-Azerbaijan' 
  continent='Asia'
  name='Azerbaijan'
  datacode='AJ'
  total_area='86600'
  population='7676953'
  population_growth='0.78'
  infant_mortality='74.5'
  inflation='85'
  gdp_total='11500'
  indep_date='30 08 1991'
  government='republic'
  capital='Baku'>
  <ethnicgroups name='Russian'>2.5</ethnicgroups>
  <ethnicgroups name='Armenian'>2.3</ethnicgroups>
  <ethnicgroups name='Azeri'>90</ethnicgroups>
  <ethnicgroups name='Dagestani Peoples'>3.2</ethnicgroups>
  <religions name='Muslim'>93.4</religions>
  <religions name='Armenian Orthodox'>2.3</religions>
  <religions name='Russian Orthodox'>2.5</religions>
  <languages name='Russian'>3</languages>
  <languages name='Armenian'>2</languages>
  <languages name='Azeri'>89</languages>
  <borders country='cid-cia-Armenia'>787</borders>
  <borders country='cid-cia-Georgia'>322</borders>
  <borders country='cid-cia-Iran'>611</borders>
  <borders country='cid-cia-Russia'>284</borders>
  <borders country='cid-cia-Turkey'>9</borders>
  <coasts>Caspian Sea</coasts>
</country>

<country id='cid-cia-Bahrain' 
  continent='Asia'
  name='Bahrain'
  datacode='BA'
  total_area='620'
  population='590042'
  population_growth='2.27'
  infant_mortality='17.1'
  inflation='3'
  gdp_total='7300'
  indep_date='15 08 1971'
  government='traditional monarchy'
  capital='Manama'>
  <ethnicgroups name='Arab'>10</ethnicgroups>
  <ethnicgroups name='Asian'>13</ethnicgroups>
  <ethnicgroups name='Bahraini'>63</ethnicgroups>
  <ethnicgroups name='Iranian'>8</ethnicgroups>
  <religions name='Sunni Muslim'>25</religions>
  <religions name='Shia Muslim'>75</religions>
  <coasts>Persian Gulf</coasts>
</country>

如果存在与给定国家/地区相关的通胀值,我必须使用XML解析此问题以获取nameinflation值。

我在这里有一个Rubular设置:http://rubular.com/r/L7pbX2mm1J我的进度。我让它返回两场比赛,这很好,但如果你仔细观察第一场比赛,该国家是阿什莫尔和卡地亚群岛然后看看那个国家的XML并且没有通货膨胀 - 正则表达式一直在下降直到它找到一个通胀值,然后关闭它。

我想知道是否有办法可以进行某种条件操作来检查是否存在通胀密钥,如果是,请抓住名称值和通胀值......

提前致谢!

3 个答案:

答案 0 :(得分:2)

不要使用XML的正则表达式。相反,请使用Nokogiri之类的引擎。

答案 1 :(得分:2)

你确实可以使用Nokogiri,一个例子:

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::XML(open('./country.xml'))
doc.xpath('//country[@inflation]/@name|//country/@inflation').each do |res|
puts res
end

如果你“需要”使用正则表达式,那么这个应该做的工作:

<country [^>]*? name='(?<name>[^']+)'[^>]*? inflation='(?<inflation>[^']+)' 

答案 2 :(得分:1)

Ruby标准库包含XML解析器REXML