Perl:比较xml文件中的行

时间:2016-07-07 14:40:37

标签: xml perl

我有一个看起来有点像这样的xml文件:

<root>
  <project id="1">
    <element name="stuff" version="1.0"/>
    <element name="stuff" version="1.2"/>
    <element name="table" version="0.8"/>
  </project>
  <project id="2">
    <element name="fruit" version="1.0"/>
    <element name="tree" version="1.2"/>
    <element name="tree" version="0.8"/>
    <element name="tree" version="2.5"/>
  </project>
</root>

我希望删除所有具有较差版本号的元素。到目前为止我所知道的是读入文件并检测包含元素的行:

open(FILE, "<file.xml"); 
my @line = <FILE>; 
close(FILE); 
open(FILE, ">file.xml"); 
foreach my $line (@line) {
  if (index ($line, '<element') != -1) {
    #only print newer versions here
  }
}

但现在我不知道该怎么做。我知道我可以比较这样的版本号:version->parse($variable1) < version->parse($variable2)但是如何比较同一文件的两行然后删除旧版本号的那一行?

2 个答案:

答案 0 :(得分:2)

这样的事情会做到:

#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;

#set a handler for 'element' - this just collects the highest 'versions'.
my %version_of; 
sub get_highest_version {
    my ( $twig, $element ) = @_; 
    my $name = $element -> att('name'); 
    my $version = $element -> att('version'); 
    if ( not defined $version_of{$name}
         or $version_of{$name} < $version ) {
            $version_of{$name} = $version; 
    }
}

#create a parser, set it to use the above handler for 'element' elements. 
my $twig = XML::Twig->new ( twig_handlers => { 'element' => \&get_highest_version } );
#parse the data (in __DATA__ below - you probably want to use 'parsefile' instead)
$twig -> parse( \*DATA );

#output for debug - see what the highest versions of each actually were. 
print Dumper \%version_of;

#iterate each of the 'element' nodes. 
foreach my $element ( $twig -> get_xpath ('//element') ) {
    #extract name/version from this element. 
    my $name = $element -> att('name'); 
    my $version = $element -> att('version');
    #delete this node unless it's the highest version.  
    $element -> delete unless $version >= $version_of{$name}; 
}

#set output indentation and print
$twig -> set_pretty_print('indented_a');
$twig -> print;


__DATA__
<root>
  <project id="1">
    <element name="stuff" version="1.0"/>
    <element name="stuff" version="1.2"/>
    <element name="table" version="0.8"/>
  </project>
  <project id="2">
    <element name="fruit" version="1.0"/>
    <element name="tree" version="1.2"/>
    <element name="tree" version="0.8"/>
    <element name="tree" version="2.5"/>
  </project>
</root>

虽然注意 - 如果你有两个相同的版本,这确实意味着你可能看到重复。它也完全忽略了“项目”层次结构 - 它寻找全球最高版本。 (你可以通过跟踪项目ID很容易地做到这一点)

答案 1 :(得分:0)

使用支持XML的工具。例如,我维护xshXML::LibXML的包装:

open file.xml ;
for my $project in /root/project {
    for my $element in $project/element {
        if ($element/@version 
            != xsh:max($element/../element[@name=$element/@name]/@version)
        ) delete $element ;
    }
}
save :b ;

如果每个名称有多个版本,则提前计算最大值可能会更快:

for my $project in /root/project {
    my $versions := hash ../@name $project/element/@version ;
    for my $name in { keys %$versions } {
        my $max = xsh:max(xsh:lookup('versions', $name)) ;
        delete $project/element[@name = $name][@version != $max] ;
    }
}