如何使用Perl的XML :: Twig删除元素?

时间:2011-10-27 12:47:05

标签: xml perl twig

我有一些XML文件,如下所示:

<machines>
<server>
    127.0.0.1
</server>
<proxy>
    <ip>127.0.0.2</ip>
    <etc>abc</etc>
</proxy>
</machines>

我希望保留服务器并删除其他服务器,输出应为:

<machines>
<server>
127.0.0.1
</server>
</machines>

我写了如下脚本:

use warnings;
use strict;
use feature ':5.10';
use XML::Twig;

my $path='C:\strawberry\perl\site\lib\file.xml';
my $filehandle;
my $tweak_server =sub{
    my ($twig, $root) =@_;
    my $elt=$root;
    while( $elt=$elt->next_elt($root)){
        my $tag=$elt->tag;
        say $tag;
        if ($tag!~/server/){
            $elt->delete($tag);         
        }       
    }
    $twig->flush;
};




open( $filehandle, "+<$path") or die "cannot open out file out_file:$!";
my $roots = { machines => 1 };
my $handlers = { 'machines' => $tweak_server,
            };
my $twig = new XML::Twig(TwigRoots => $roots,
                 TwigHandlers => $handlers,
                 pretty_print  => 'indented'#,
                # twig_print_outside_roots => \*$filehandle
                 );
$twig->parsefile($path);
close $filehandle;

得到了输出:

server
#PCDATA
<machines>
<server></server>
<proxy>
<ip>127.0.0.2</ip>
<etc>abc</etc>
</proxy>
</machines>

我真的不明白为什么会出现“#PCDATA”以及为什么它不像我预期的那样起作用?

@mirod我尝试如下:

use warnings;
use strict;
use feature ':5.10';
use XML::Twig;

my $tweak_server =sub{
my ($twig, $root) =@_;
my $elt=$root;
my $text=$elt->first_child_text('id');
if ($text=~m/12/){
    while( $elt=$elt->next_elt('#ELT')){
        my $tag=$elt->tag;
        say $tag;
        if ($tag!~/id/){
            $elt->delete;           
        }       
    }
}
};

my $roots = { machines => 1 };
my $handlers = { 'machines/aaa' => $tweak_server,
            };
my $twig =XML::Twig->new(TwigRoots => $roots,
                 TwigHandlers => $handlers,
                 pretty_print  => 'indented'#,
                # twig_print_outside_roots => \*$filehandle
                 )
    ->parse( \*DATA) 
    ->print; 
__DATA__

<machines> 
<server> 127.0.0.1 </server> 
<aaa>
<id>12</id> 
<ip>127.0.0.2</ip>   
<option>127.0.0.6</option>
<etc>abc</etc>
</aaa> 
<aaa>
<id>14</id> 
<ip>127.0.0.2</ip>   
<etc>abc</etc>
</aaa> 
<aaa>
<id>15</id> 
<ip>127.0.0.2</ip>
<etc>abc</etc>
</aaa>
</machines>

,输出为:

<machines>
<server> 127.0.0.1 </server>
<aaa>
<id>12</id>
<option>127.0.0.6</option>
<etc>abc</etc>
</aaa>
<aaa>
<id>14</id>
<ip>127.0.0.2</ip>
<etc>abc</etc>
</aaa>
<aaa>
<id>15</id>
<ip>127.0.0.2</ip>
<etc>abc</etc>
</aaa>
</machines>

我想要的是删除这三个元素,而不只是一个:

<ip>127.0.0.2</ip>   
<option>127.0.0.6</option>
<etc>abc</etc>
元素

下的

 <id>12</id>

任何建议?

2 个答案:

答案 0 :(得分:2)

以下内容将删除proxy元素:

use warnings;
use strict;
use XML::Twig;

my $str = '
<machines>
<server>
    127.0.0.1
</server>
<proxy>
    <ip>127.0.0.2</ip>
    <etc>abc</etc>
</proxy>
</machines>
';

my $t = XML::Twig->new(
        twig_handlers => {
            proxy => sub { $_->delete() },
        },
        pretty_print  => 'indented',
);
$t->parse($str);
$t->print($str);
print "\n";

__END__

<machines>
  <server>
    127.0.0.1
</server>
</machines>

如果您不想打印server#PCDATA,请删除say $tag;

答案 1 :(得分:2)

如果您的要求是仅保留服务器元素,那么您可以通过将它们设置为twig_roots来告诉模块。这将保留XML和服务器元素(及其内容)的根,同时丢弃所有其余部分:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

XML::Twig->new( twig_roots => { server => 1 },
                pretty_print => 'indented',
              )
         ->parse( \*DATA)
         ->print;

__DATA__
<machines>
<server>
    127.0.0.1
</server>
<proxy>
    <ip>127.0.0.2</ip>
    <etc>abc</etc>
</proxy>
</machines>