我有一个看起来有点像这样的xml文件:
<root>
<project id="1">
<element name="stuff" version="1.0"/>
<element name="stuff" version="1.2"/>
<element name="table" version="0.8"/>
</project>
<project id="2">
<element name="fruit" version="1.0"/>
<element name="tree" version="1.2"/>
<element name="tree" version="0.8"/>
<element name="tree" version="2.5"/>
</project>
</root>
我希望删除所有具有较差版本号的元素。到目前为止我所知道的是读入文件并检测包含元素的行:
open(FILE, "<file.xml");
my @line = <FILE>;
close(FILE);
open(FILE, ">file.xml");
foreach my $line (@line) {
if (index ($line, '<element') != -1) {
#only print newer versions here
}
}
但现在我不知道该怎么做。我知道我可以比较这样的版本号:version->parse($variable1) < version->parse($variable2)
但是如何比较同一文件的两行然后删除旧版本号的那一行?
答案 0 :(得分:2)
这样的事情会做到:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
#set a handler for 'element' - this just collects the highest 'versions'.
my %version_of;
sub get_highest_version {
my ( $twig, $element ) = @_;
my $name = $element -> att('name');
my $version = $element -> att('version');
if ( not defined $version_of{$name}
or $version_of{$name} < $version ) {
$version_of{$name} = $version;
}
}
#create a parser, set it to use the above handler for 'element' elements.
my $twig = XML::Twig->new ( twig_handlers => { 'element' => \&get_highest_version } );
#parse the data (in __DATA__ below - you probably want to use 'parsefile' instead)
$twig -> parse( \*DATA );
#output for debug - see what the highest versions of each actually were.
print Dumper \%version_of;
#iterate each of the 'element' nodes.
foreach my $element ( $twig -> get_xpath ('//element') ) {
#extract name/version from this element.
my $name = $element -> att('name');
my $version = $element -> att('version');
#delete this node unless it's the highest version.
$element -> delete unless $version >= $version_of{$name};
}
#set output indentation and print
$twig -> set_pretty_print('indented_a');
$twig -> print;
__DATA__
<root>
<project id="1">
<element name="stuff" version="1.0"/>
<element name="stuff" version="1.2"/>
<element name="table" version="0.8"/>
</project>
<project id="2">
<element name="fruit" version="1.0"/>
<element name="tree" version="1.2"/>
<element name="tree" version="0.8"/>
<element name="tree" version="2.5"/>
</project>
</root>
虽然注意 - 如果你有两个相同的版本,这确实意味着你可能看到重复。它也完全忽略了“项目”层次结构 - 它寻找全球最高版本。 (你可以通过跟踪项目ID很容易地做到这一点)
答案 1 :(得分:0)
使用支持XML的工具。例如,我维护xsh,XML::LibXML的包装:
open file.xml ;
for my $project in /root/project {
for my $element in $project/element {
if ($element/@version
!= xsh:max($element/../element[@name=$element/@name]/@version)
) delete $element ;
}
}
save :b ;
如果每个名称有多个版本,则提前计算最大值可能会更快:
for my $project in /root/project {
my $versions := hash ../@name $project/element/@version ;
for my $name in { keys %$versions } {
my $max = xsh:max(xsh:lookup('versions', $name)) ;
delete $project/element[@name = $name][@version != $max] ;
}
}