Perl:解析SVG文件

时间:2018-12-18 20:19:44

标签: xml perl svg

我想在Perl中解析SVG文件,但是出于不同的原因,我建议不要使用某些库(XML :: Simple,XML :: XPath)。下面的线程建议使用XML :: LibXML :: XPathContext;

Perl XML/SVG Parser unable to findnodes

假设我使用XML :: LibXML :: XPathContext,我仍然不确定如何提取我感兴趣的节点: 1)具有“ id”且包含“ Drawing ...”,其大小(路径填充... d =“ ..等)和文本(” tspan“)的那些 2)不属于任何“ Drawing_”节点及其位置(d =“ ...)

的“路径”节点(在SVG的底部)
use XML::LibXML;
use XML::LibXML::XPathContext;

my $doc = XML::LibXML->load_xml( location => $file);
my $xpc = XML::LibXML::XPathContext->new( $doc);
$xpc->registerNs(x => 'http://www.w3.org/2000/svg');

foreach my $drawing ($xpc->findnodes( ??? ) {
    print "Found drawing\n";
}

foreach my $path ($xpc->findnodes( ??? ) {
    print "Found path\n";
}

我的SVG:

<?xml version="1.0" encoding="UTF-8"?>
<svg version="1.2">
 <g visibility="visible" id="Master" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve">
  <rect fill="none" stroke="none" x="0" y="0" width="86360" height="55880"/>
 </g>
 <g visibility="visible" id="Page1">
  <g id="Drawing_1">
   <path fill="rgb(255,211,32)" stroke="none" d="M 15350,3285 L 31988,3285 31988,4937 15350,4937 15350,3285 15350,3285 Z"/>
   <path fill="none" stroke="rgb(128,128,128)" stroke-width="102" stroke-linejoin="round" d="M 15350,3285 L 31988,3285 31988,4937 15350,4937 15350,3285 15350,3285 Z"/>
   <g fill="rgb(0,0,0)" stroke="none" font-family="Arial Narrow embedded" font-size="635" font-style="normal" font-weight="700">
    <text x="19327" y="3967">
     <tspan x="19327 19471 19788 19962">Info</tspan></text>
    <text fill="rgb(0,0,0)" stroke="none" x="17558" y="4699">
     <tspan x="17558">I</tspan></text>
   </g>
  </g>
  <g id="Drawing_2">
   <path fill="rgb(207,231,245)" stroke="none" d="M 8747,10525 L 4810,10525 4810,8239 12684,8239 12684,10525 8747,10525 Z"/>
   <path fill="none" stroke="rgb(128,128,128)" stroke-width="102" stroke-linejoin="round" d="M 8747,10525 L 4810,10525 4810,8239 12684,8239 12684,10525 8747,10525 Z"/>
   <g fill="rgb(0,0,0)" stroke="none" font-family="Arial Narrow embedded" font-size="635" font-style="normal" font-weight="700">
    <text x="5547" y="8872">
     <tspan x="5547 6030">OK</tspan></text>
    <text fill="rgb(0,0,0)" stroke="none" x="5215" y="9604">
     <tspan x="5215 5359 5676 5850">Info</tspan></text>
   </g>
  </g>
  ...
  <g>
   <path fill="none" stroke="rgb(51,153,255)" id="Drawing_78_0" stroke-width="102" stroke-linejoin="round" d="M 47291,16367 C 47291,17129 48093,16793 48482,17017"/>
   <path fill="rgb(51,153,255)" stroke="none" id="Drawing_78_1" d="M 48688,17383 L 48598,16917 48337,17064 48688,17383 Z"/>
  </g>
  <g>
   <path fill="none" stroke="rgb(51,153,255)" id="Drawing_79_0" stroke-width="102" stroke-linejoin="round" d="M 39417,4937 C 39417,14271 23887,8230 23425,16977"/>
   <path fill="rgb(51,153,255)" stroke="none" id="Drawing_79_1" d="M 23415,17383 L 23577,16937 23277,16929 23415,17383 Z"/>
  </g>
  ...
 </g>
</svg>

1 个答案:

答案 0 :(得分:2)

首先,您不需要使用XML::LibXML::XPathContext,因为您的XML没有使用名称空间。

但是,您将必须遍历所有节点属性并进行检查。一种方法是遍历节点属性,一旦找到所需的节点,就可以对其进行处理(例如提取属性值,获取子节点等),使用XML::LibXML::Node

中的方法
use v5.10;
use strict;
use warnings;

use XML::LibXML;

my $doc = XML::LibXML->load_xml( location => $ARGV[0] );

NODES: for my $node ($doc->findnodes('//g')) {
    for my $attr ($node->attributes) {
        if ($attr->nodeName eq 'id' && $attr->value =~ /^Drawing/) {
            # it's a drawing node
            # do stuff
            next NODES;
        }
    }
    # it's not a drawing node
    for my $pathnode ($node->findnodes('path')) {
        # do stuff
    }
}

您还可以使用纯XPath查找节点。

my @drawings = $doc->findnodes('//g[starts-with(@id,"Drawing")]');
my @paths = $doc->findnodes('//path[not(ancestor::g[starts-with(@id,"Drawing")])]');

信用这些帖子以供XPath参考:

XPath Select Nodes where all parent nodes do not contain specific attribute and value
XPath: using regex in contains function