use strict;
use warnings;
my @array = qw / abc de fghi jklm sdfe kk/;
my $filter = join '|', @array;
$filter = qr/$filter/;
while ( my $record = <DATA> ) {
$record =~ s/\<line\>[^<]*?\b$filter\b[^<]*?\<\/line\>//ig;
print $record if $record =~ /\S/;
}
在此XML数据上使用上述代码
<data>
<line> sdfe abc adsfefsdf </line>
<line> abc sdffedcfsdf sdf </line>
<line> sdfe </line><line> abc </line>
<line> sd sfefsdf </line>
<line> sdfe abc
<line2> afw sdf a </line2> <line3> kasd ads fewf 2323 </line3>
adsfefsdf </line>
<line> fhgh kk jj hjsda </line>
<line> abc </line>
..
..
..
</data>
生成以下输出
<data>
<line> sdfe </line>
<line> sd sfefsdf </line>
<line> sdfe abc
<line2> afw sdf a </line2> <line3> kasd ads fewf 2323 </line3>
..
..
..
</data>
预期输出如下
<data>
<line> sdfe </line>
<line> sd sfefsdf </line>
..
..
..
</data>
以上表明Perl脚本会删除包含元素abc
,de
,fghi
,jklm
,sdfe
或{{1}的标记}。
例如,在下面的XML文件的两行中
kk
原始Perl代码工作并生成输出
<line> abc sdffedcfsdf sdf </line>
<line> sdfe </line> <line> abc </line>
如果元素中存在其他标记,则代码无效。是否可以增强脚本,以便删除包含要删除的所需元素的标记中的其他标记。
例如,给定下面的XML数据
<line> sdfe </line>
Perl代码是否可以增强,以便删除其他标记,如果该行包含可移除元素<line> sdfe abc
<line2> afw sdf a </line2> <line3> kasd ads fewf 2323 </line3>
adsfefsdf </line>
,abc
,de
,{{1},则删除该行},fghi
或jklm
包含不必要的其他标签?
答案 0 :(得分:2)
您需要使用正确的XML解析器,而不是尝试使用正则表达式
来破解它此程序使用XML::Twig
模块。如果您愿意,XML::LibXML
是一个不错的选择
use strict;
use warnings;
use XML::Twig;
my @discard = qw / abc de fghi jklm sdfe kk/;
my $filter = join '|', @discard;
$filter = qr/\b(?:$filter)\b/;
my $twig = XML::Twig->new;
$twig->parse(\*DATA);
for my $line ( $twig->findnodes('//line') ) {
$line->delete if $line->text =~ $filter;
}
$twig->print;
__DATA__
<data>
<line> sdfe abc adsfefsdf </line>
<line> abc sdffedcfsdf sdf </line>
<line> sdfe </line>
<line> abc </line>
<line> sd sfefsdf </line>
<line>
sdfe abc
<line2> afw sdf a </line2>
<line3> kasd ads fewf 2323 </line3>
adsfefsdf
</line>
<line> fhgh kk jj hjsda </line>
<line> abc </line>
..
..
..
</data>
<data><line> sd sfefsdf </line>
..
..
..
</data>
答案 1 :(得分:-2)
以下代码处理嵌套的无效元素的情况。它仍然不允许你的xml中的cdata部分。
use strict;
use warnings;
my @array = qw /abc de fghi jklm sdfe kk/;
my $filter = join '|', @array;
#
# The xmlprocessing is not line-oriented, thus the whole content can be read in a single sweep.
#
my $record;
{
local $/ = undef;
$record = <DATA>;
}
#
# Repeatedly eliminate inner <link...> elements
#
while ($record =~ /<line[0-9]>/) {
$record =~ s#<(line[0-9])>[^<]*</\1>##ig;
}
#
# Filter according to the predefined list
#
$record =~ s/<line>[^<]*\b($filter)\b[^<]*<\/line>//ig;
print $record if $record =~ /\S/;