查找XML结构化文档中的部分级别 - perl 输入:
<section>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section>
<para>...level 2</para>
<para>...level 2</para>
<section>
<para>...level 3</para>
<para>...level 3</para>
<para>...level 3</para>
</section>
<para>...level 2</para>
</section>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
</section>
<section>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
</section>
我需要获取所有节级元素并根据级别插入值。所需的输出如下:
<section1>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<section3>
<para>...level 3</para>
<para>...level 3</para>
<para>...level 3</para>
</section3>
<para>...level 2</para>
</section2>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
</section1>
<section1>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
</section1>
首先尝试:
foreach my $lines ( @splitCnt ) {
if ( $lines =~ m/<section\s+/g ) {
$opn++;
$lines =~ s/<section\s+/<section$opn /i;
$cls = $opn;
$opn++;
}
elsif ( $lines =~ m/<\/section>/g ) {
$opn = $opn - 1;
$lines =~ s/<\/section>/<\/section$opn>/i;
}
$all_lines .= "$lines\n";
}
第二次尝试:
my ( $pre1, $match1, $post1 ) = "";
while ( $incnt =~ m/<section\s+[^>]*>/g ) {
$pre1 = $`;
$match1 = $&;
$post1 = $';
my $Opn = '1';
my $Cls = "";
$match1 =~ s/<section\s+/<section$Opn /gi;
if ( $post1 =~ m/<section\s+/i ) {
$Opn++;
$post1 =~ s/<section\s+/<section$Opn /;
$Opn = $Cls;
}
elsif ( $post1 =~ m/<\/section>/i ) {
$post1 =~ s/<\/section/<\/section$Cls/;
}
$pre1 .= $match1;
$incnt = $post1;
print "$pre1\n";
system 'pause';
}
if ( length $pre1 ) {
$incnt = $pre1 . $post1;
}
任何人都可以帮助这个......
答案 0 :(得分:4)
说真的 - 不要对XML使用常规表达。这是个坏消息。有一些完全有效的东西,你可以用XML来打破正则表达式 - 所以你得到的是破碎的XML,以及可能在某一天可怕破坏的脆弱代码,没有人会知道为什么。
使用解析器。就个人而言 - 我喜欢XML::Twig
你可以很容易地采取和重命名标签:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
sub process_section {
my ( $section, $depth ) = @_;
$depth++;
$section->set_tag("section$depth");
foreach my $subsection ( $section->children('section') ) {
process_section( $subsection, $depth );
}
}
my $twig = XML::Twig->new( 'pretty_print' => 'indented_a' );
$twig->parsefile ( 'your_file.xml' );
foreach my $section ( $twig->findnodes('section') ) {
process_section( $section, 0 );
}
$twig->print;
我也会指出 - 你的初始问题听起来像XY problem。你想达到什么目的?进行这种操作通常是不可取的 - 根据层次结构更改标签,因为那时......好吧,那么你不能做我刚做过的事情 - 递归遍历数据结构。
答案 1 :(得分:2)
这是使用XML::LibXML
模块的变体。它只是查找所有section
元素并通过计算XPath表达式中的斜杠数来达到它们的层次结构
然而,正如其他人所说的那样,这是一件很奇怪的事情,而且听起来很像一个不同问题的解决方案。如果您解释了完整的问题,那么我们可以帮助您更好
use strict;
use warnings;
use XML::LibXML;
my $doc = XML::LibXML->load_xml(IO => \*DATA);
for my $section ( $doc->findnodes('//section') ) {
my $n = $section->nodePath =~ tr|/|| - 1;
$section->setNodeName("section$n");
}
print $doc;
__DATA__
<root>
<section>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section>
<para>...level 2</para>
<para>...level 2</para>
<section>
<para>...level 3</para>
<para>...level 3</para>
<para>...level 3</para>
</section>
<para>...level 2</para>
</section>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
</section>
<section>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
<section>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section>
</section>
</root>
<?xml version="1.0"?>
<root>
<section1>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<section3>
<para>...level 3</para>
<para>...level 3</para>
<para>...level 3</para>
</section3>
<para>...level 2</para>
</section2>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
</section1>
<section1>
<para>...level 1</para>
<para>...level 1</para>
<para>...level 1</para>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
<section2>
<para>...level 2</para>
<para>...level 2</para>
<para>...level 2</para>
</section2>
</section1>
</root>