使用libxml在Ruby中解析XML文档时,我从find XPath调用中收到太多数据。
我的测试数据是:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<MAIN>
<EPS>
<EP ID="EDM01">EP 1
<BP ID="EDM01_BP1">BP1 for EP1
<Activities>
<Activity ID="1">Activity 1 for EDM01_BP1</Activity>
<Activity ID="2">Activity 2 for EDM01_BP1</Activity>
<Activity ID="3">Activity 3 for EDM01_BP1</Activity>
</Activities>
</BP>
<BP ID="EDM01_BP2">BP2 for EP1
<Activities>
<Activity ID="1">Activity 1 for EDM01_BP2</Activity>
<Activity ID="2">Activity 2 for EDM01_BP2</Activity>
<Activity ID="3">Activity 3 for EDM01_BP2</Activity>
</Activities>
</BP>
</EP>
<EP ID="APO01">EP 2
<BP ID="APO01_BP1">BP 1 for EP2
<Activities>
<Activity ID="1">Activity 1 for APO01_BP1</Activity>
<Activity ID="2">Activity 2 for APO01_BP1</Activity>
<Activity ID="3">Activity 3 for APO01_BP1</Activity>
<Activity ID="4">Activity 4 for APO01_BP1</Activity>
<Activity ID="5">Activity 5 for APO01_BP1</Activity>
</Activities>
</BP>
</EP>
</EPS>
</MAIN>
我解析它:
xmlparser = XML::Parser.string(@strXML,:encoding => XML::Encoding::UTF_8)
@xmlDoc = xmlparser.parse
@projects = nil
project = nil
cl = @xmlDoc.find('/MAIN')
unless (cl.empty?)
puts ""
@projects = @xmlDoc.find('//EP [@ID]')
@projects.each do |p|
puts('<----------1--------->')
puts(p.inner_xml)
bps = p.find('//BP [@ID]')
bps.each do |bp|
puts('<----------2--------->')
puts(bp.inner_xml)
puts('<---- Activities ---->')
acts = bp.find('//Activity [@ID]')
acts.each do |act|
puts('ActID> ' + act['ID'].to_s)
puts(act.first.content.to_s)
end
end
end
end
assert true
end
查看显示的结果时,显示获取的xml :: node是正确的(p.inner_xml)
<----------1--------->
EP 1 <BP ID="EDM01_BP1">BP1 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP1</Activity><Activity ID="2">Activity 2 for EDM01_BP1</Activity><Activity ID="3">Activity 3 for EDM01_BP1</Activity> </Activities></BP><BP ID="EDM01_BP2">BP2 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP2</Activity><Activity ID="2">Activity 2 for EDM01_BP2</Activity><Activity ID="3">Activity 3 for EDM01_BP2</Activity> </Activities></BP>
<----------2--------->
BP1 for EP1<Activities><Activity ID="1">Activity 1 for EDM01_BP1</Activity> <Activity ID="2">Activity 2 for EDM01_BP1</Activity><Activity ID="3">Activity 3 for EDM01_BP1</Activity></Activities>
<---- Activities ---->
ActID> 1
Activity 1 for EDM01_BP1
ActID> 2
Activity 2 for EDM01_BP1
ActID> 3
Activity 3 for EDM01_BP1
ActID> 1
Activity 1 for EDM01_BP2
ActID> 2
Activity 2 for EDM01_BP2
ActID> 3
Activity 3 for EDM01_BP2
ActID> 1
Activity 1 for APO01_BP1
ActID> 2
Activity 2 for APO01_BP1
ActID> 3
Activity 3 for APO01_BP1
ActID> 4
Activity 4 for APO01_BP1
ActID> 5
Activity 5 for APO01_BP1
正如您所看到的,第一个XML节点仅检查了3个活动。 但该程序显示完整的xml doc中的所有活动。不只是从获取的节点。
在执行xmldoc.find()并使用
遍历它时,这是一个错误的假设
nodes.each do | n |
n变量是一个libXML :: XML :: Node,它是xml文档的一个子集?
如何引用不属于获取节点的数据(活动APOxxx)?
答案 0 :(得分:0)
替换
行bps = p.find('//BP [@ID]')
acts = bp.find('//Activity [@ID]')
与
bps = p.find('BP [@ID]')
acts = bp.find('Activities/Activity [@ID]')
以/
或//
开头的XPath表达式为absolute location paths,并始终返回根节点下的所有匹配节点,而不管上下文节点p
或bp
。这类似于绝对的文件系统路径,它并不关心当前目录。