我想在Perl中解析SVG文件,但是出于不同的原因,我建议不要使用某些库(XML :: Simple,XML :: XPath)。下面的线程建议使用XML :: LibXML :: XPathContext;
Perl XML/SVG Parser unable to findnodes
假设我使用XML :: LibXML :: XPathContext,我仍然不确定如何提取我感兴趣的节点: 1)具有“ id”且包含“ Drawing ...”,其大小(路径填充... d =“ ..等)和文本(” tspan“)的那些 2)不属于任何“ Drawing_”节点及其位置(d =“ ...)
的“路径”节点(在SVG的底部)use XML::LibXML;
use XML::LibXML::XPathContext;
my $doc = XML::LibXML->load_xml( location => $file);
my $xpc = XML::LibXML::XPathContext->new( $doc);
$xpc->registerNs(x => 'http://www.w3.org/2000/svg');
foreach my $drawing ($xpc->findnodes( ??? ) {
print "Found drawing\n";
}
foreach my $path ($xpc->findnodes( ??? ) {
print "Found path\n";
}
我的SVG:
<?xml version="1.0" encoding="UTF-8"?>
<svg version="1.2">
<g visibility="visible" id="Master" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve">
<rect fill="none" stroke="none" x="0" y="0" width="86360" height="55880"/>
</g>
<g visibility="visible" id="Page1">
<g id="Drawing_1">
<path fill="rgb(255,211,32)" stroke="none" d="M 15350,3285 L 31988,3285 31988,4937 15350,4937 15350,3285 15350,3285 Z"/>
<path fill="none" stroke="rgb(128,128,128)" stroke-width="102" stroke-linejoin="round" d="M 15350,3285 L 31988,3285 31988,4937 15350,4937 15350,3285 15350,3285 Z"/>
<g fill="rgb(0,0,0)" stroke="none" font-family="Arial Narrow embedded" font-size="635" font-style="normal" font-weight="700">
<text x="19327" y="3967">
<tspan x="19327 19471 19788 19962">Info</tspan></text>
<text fill="rgb(0,0,0)" stroke="none" x="17558" y="4699">
<tspan x="17558">I</tspan></text>
</g>
</g>
<g id="Drawing_2">
<path fill="rgb(207,231,245)" stroke="none" d="M 8747,10525 L 4810,10525 4810,8239 12684,8239 12684,10525 8747,10525 Z"/>
<path fill="none" stroke="rgb(128,128,128)" stroke-width="102" stroke-linejoin="round" d="M 8747,10525 L 4810,10525 4810,8239 12684,8239 12684,10525 8747,10525 Z"/>
<g fill="rgb(0,0,0)" stroke="none" font-family="Arial Narrow embedded" font-size="635" font-style="normal" font-weight="700">
<text x="5547" y="8872">
<tspan x="5547 6030">OK</tspan></text>
<text fill="rgb(0,0,0)" stroke="none" x="5215" y="9604">
<tspan x="5215 5359 5676 5850">Info</tspan></text>
</g>
</g>
...
<g>
<path fill="none" stroke="rgb(51,153,255)" id="Drawing_78_0" stroke-width="102" stroke-linejoin="round" d="M 47291,16367 C 47291,17129 48093,16793 48482,17017"/>
<path fill="rgb(51,153,255)" stroke="none" id="Drawing_78_1" d="M 48688,17383 L 48598,16917 48337,17064 48688,17383 Z"/>
</g>
<g>
<path fill="none" stroke="rgb(51,153,255)" id="Drawing_79_0" stroke-width="102" stroke-linejoin="round" d="M 39417,4937 C 39417,14271 23887,8230 23425,16977"/>
<path fill="rgb(51,153,255)" stroke="none" id="Drawing_79_1" d="M 23415,17383 L 23577,16937 23277,16929 23415,17383 Z"/>
</g>
...
</g>
</svg>
答案 0 :(得分:2)
首先,您不需要使用XML::LibXML::XPathContext
,因为您的XML没有使用名称空间。
但是,您将必须遍历所有节点属性并进行检查。一种方法是遍历节点属性,一旦找到所需的节点,就可以对其进行处理(例如提取属性值,获取子节点等),使用XML::LibXML::Node
use v5.10;
use strict;
use warnings;
use XML::LibXML;
my $doc = XML::LibXML->load_xml( location => $ARGV[0] );
NODES: for my $node ($doc->findnodes('//g')) {
for my $attr ($node->attributes) {
if ($attr->nodeName eq 'id' && $attr->value =~ /^Drawing/) {
# it's a drawing node
# do stuff
next NODES;
}
}
# it's not a drawing node
for my $pathnode ($node->findnodes('path')) {
# do stuff
}
}
您还可以使用纯XPath查找节点。
my @drawings = $doc->findnodes('//g[starts-with(@id,"Drawing")]');
my @paths = $doc->findnodes('//path[not(ancestor::g[starts-with(@id,"Drawing")])]');
信用这些帖子以供XPath参考:
XPath Select Nodes where all parent nodes do not contain specific attribute and value
XPath: using regex in contains function