我有一个包含数千个条目的XML文件
<mediawiki>
<page>
<title>page1</title>
<revision>
<id>2621</id>
<parentid>6</parentid>
<timestamp>2005-10-09T01:00:18Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text1</text>
</revision>
</page>
<page>
<title>page2</title>
<ns>8</ns>
<id>7</id>
<revision>
<id>2619</id>
<parentid>2618</parentid>
<timestamp>2005-10-09T00:56:39Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text2</text>
</revision>
</page>
<page>
<title>page3</title>
<ns>8</ns>
<id>6</id>
<revision>
<id>2621</id>
<parentid>6</parentid>
<timestamp>2005-10-09T01:00:18Z</timestamp>
<contributor>
<username>Chaos</username>
<id>2</id>
</contributor>
<model>wikitext</model>
<format>text/x-wiki</format>
<text xml:space="preserve">text3</text>
</revision>
</page>
</mediawiki>
通过我的脚本,每个页面必须是一个文本文件,其名称是标记<title>
的内容,并包含<text xml:space="preserve"></text>
我的代码
my $filename = "pages.xml";
my $parser = XML::LibXML->new();
my $xmldoc = $parser->parse_file( $filename );
my $file;
foreach my $page ( $xmldoc->findnodes( '/mediawiki/page' ) ) {
foreach my $title ( $page->findnodes( '/mediawiki/page/title' ) ) {
foreach my $rev ( $page->findnodes( '/mediawiki/page/revision' ) ) {
foreach my $text ( $rev->findnodes( 'text/text()' ) ) {
$file = $title->to_literal();
my $newfile = "$file.txt";
open( my $out, '>:utf8', $newfile )
or die "Unable to open '$newfile' for write: $!";
my $texte = $text->data;
print $out "$text\n";
close $out;
}
}
}
}
问题是每个构建的文件都包含与最后一个标记<text xml:space="preserve"></text>
答案 0 :(得分:1)
您的错误是嵌套所有for
循环而不使用相对XPath表达式
这应该做你想要的事情
use utf8;
use strict;
use warnings 'all';
use feature 'say';
STDOUT->autoflush;
use XML::LibXML;
my $filename = "pages.xml";
my $doc = XML::LibXML->load_xml( location => $filename );
for my $page ( $doc->findnodes('/mediawiki/page') ) {
my ($title) = $page->findnodes('title');
my $file = $title->textContent;
my ($rev_text) = $page->findnodes('revision/text');
my $text = $rev_text->textContent;
open my $fh, '>:utf8', $file
or die qq{Unable to open "$file" for output: $!};
print $fh "$text\n";
close $fh;
say qq{File "$file" written with "$text"};
}
File "page1" written with "text1"
File "page2" written with "text2"
File "page3" written with "text3"