ruby libxml:返回的节点值来自哪里?

时间:2015-10-25 19:23:44

标签: ruby libxml2 xpath-2.0

使用libxml在Ruby中解析XML文档时,我从find XPath调用中收到太多数据。

我的测试数据是:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<MAIN>
  <EPS>
    <EP ID="EDM01">EP 1
      <BP ID="EDM01_BP1">BP1 for EP1
        <Activities>
          <Activity ID="1">Activity 1 for EDM01_BP1</Activity>
          <Activity ID="2">Activity 2 for EDM01_BP1</Activity>
          <Activity ID="3">Activity 3 for EDM01_BP1</Activity>
        </Activities>
      </BP>
      <BP ID="EDM01_BP2">BP2 for EP1
         <Activities>
           <Activity ID="1">Activity 1 for EDM01_BP2</Activity>
           <Activity ID="2">Activity 2 for EDM01_BP2</Activity>
           <Activity ID="3">Activity 3 for EDM01_BP2</Activity>
         </Activities>
      </BP>
    </EP>
    <EP ID="APO01">EP 2
      <BP ID="APO01_BP1">BP 1 for EP2
        <Activities>
          <Activity ID="1">Activity 1 for APO01_BP1</Activity>
          <Activity ID="2">Activity 2 for APO01_BP1</Activity>
          <Activity ID="3">Activity 3 for APO01_BP1</Activity>
          <Activity ID="4">Activity 4 for APO01_BP1</Activity>
          <Activity ID="5">Activity 5 for APO01_BP1</Activity>
        </Activities>
      </BP>
    </EP>
  </EPS>
</MAIN>

我解析它:

xmlparser = XML::Parser.string(@strXML,:encoding => XML::Encoding::UTF_8)
@xmlDoc = xmlparser.parse
@projects = nil
project = nil
cl = @xmlDoc.find('/MAIN')
unless (cl.empty?)
  puts ""
  @projects = @xmlDoc.find('//EP [@ID]')
  @projects.each do |p|
    puts('<----------1--------->')
    puts(p.inner_xml)
    bps = p.find('//BP [@ID]')
    bps.each do |bp|
      puts('<----------2--------->')
      puts(bp.inner_xml)
      puts('<---- Activities ---->')
      acts = bp.find('//Activity [@ID]')
      acts.each do |act|
        puts('ActID> ' + act['ID'].to_s)
        puts(act.first.content.to_s)
      end
    end
  end
end
assert true
end

查看显示的结果时,显示获取的xml :: node是正确的(p.inner_xml)

<----------1--------->
EP 1      <BP ID="EDM01_BP1">BP1 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP1</Activity><Activity ID="2">Activity 2 for EDM01_BP1</Activity><Activity ID="3">Activity 3 for EDM01_BP1</Activity> </Activities></BP><BP ID="EDM01_BP2">BP2 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP2</Activity><Activity ID="2">Activity 2 for EDM01_BP2</Activity><Activity ID="3">Activity 3 for EDM01_BP2</Activity>         </Activities></BP>  
<----------2--------->
BP1 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP1</Activity>          <Activity ID="2">Activity 2 for EDM01_BP1</Activity><Activity ID="3">Activity 3 for EDM01_BP1</Activity></Activities>
<---- Activities ---->  
ActID> 1  
Activity 1 for EDM01_BP1  
ActID> 2  
Activity 2 for EDM01_BP1  
ActID> 3  
Activity 3 for EDM01_BP1  
ActID> 1  
Activity 1 for EDM01_BP2  
ActID> 2  
Activity 2 for EDM01_BP2  
ActID> 3  
Activity 3 for EDM01_BP2  
ActID> 1  
Activity 1 for APO01_BP1  
ActID> 2  
Activity 2 for APO01_BP1  
ActID> 3  
Activity 3 for APO01_BP1  
ActID> 4  
Activity 4 for APO01_BP1  
ActID> 5  
Activity 5 for APO01_BP1  

正如您所看到的,第一个XML节点仅检查了3个活动。 但该程序显示完整的xml doc中的所有活动。不只是从获取的节点。

在执行xmldoc.find()并使用
遍历它时,这是一个错误的假设     nodes.each do | n | n变量是一个libXML :: XML :: Node,它是xml文档的一个子集? 如何引用不属于获取节点的数据(活动APOxxx)?

1 个答案:

答案 0 :(得分:0)

替换

bps = p.find('//BP [@ID]')
acts = bp.find('//Activity [@ID]')

bps = p.find('BP [@ID]')
acts = bp.find('Activities/Activity [@ID]')

///开头的XPath表达式为absolute location paths,并始终返回根节点下的所有匹配节点,而不管上下文节点pbp 。这类似于绝对的文件系统路径,它并不关心当前目录。