我一直在尝试使用XML::LibXML
模块拆分XML数据,但它会抛出这样的错误
Can't call method "findnodes" without a package or object reference
我的输入
<xml>
<bhap id="1">
<label>cylind - I</label>
<title>premier</title>
<rect id="S1">
<title>Short</title>
<label>1.</label>
<p><text>welcome</text></p>
</rect>
<rect id="S2">
<title>Definite</title>
<label>2.</label>
<p><text>welcome1</text></p>
</rect>
</bhap>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
<rect id=S4">
<title>Term</title>
<label>4.</label>
<p><text>welcome4</text></p>
</rect>
</bhap>
</xml>
需要输出
档案1
<xml>
<bhap id="1">
<label>cylind - I</label>
<title>premier</title>
<rect id="S1">
<title>Short</title>
<label>1.</label>
<p><text>welcome</text></p>
</rect>
</bhap>
</xml>
文件2
<xml>
<bhap id="1">
<label>cylind - I</label>
<title>premier</title>
<rect id="S2">
<title>Definite</title>
<label>2.</label>
<p><text>welcome1</text></p>
</rect>
</bhap>
</xml>
文件3
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
</bhap>
</xml>
档案4
<xml>
<bhap id="2">
<label>cylind – II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id=S4">
<title>Term</title>
<label>4.</label>
<p><text>welcome4</text></p>
</rect>
</bhap>
</xml>
我的代码
use XML::LibXML;
my $file = shift || die "usage $0 <xmlfile>";
my $parser = XML::LibXML->new();
my $doc = $parser->parse_file($file);
my @nodes = $doc->findnodes('//bhap');
foreach my $node1 (@nodes) {
my $bhap = $node1->toString(), "\n";
if ( $bhap =~ m/(<bhap.+?>.+?<\/title>)(.+?)(<\/bhap>)/is ) {
my $bhap1 = $1;
my $bhap2 = $2;
my $bhap3 = $3;
my $nodes1 = $bhap->findnodes('//rect');
foreach my $node (@$nodes1) {
my $rect = $node->toString();
if ( $rect =~ m/(<rect\s*id="(.+?)">.+?<\/rect>)/is ) {
my $var1 = $1;
my $var2 = $2;
print "file" $var2;
print "<xml>" print $bhap1;
print $var1;
print $bhap3;
print "</xml>";
}
}
}
}
答案 0 :(得分:2)
好的,所以你开始做得好,但接着......落入正则表达式&#39;陷阱。使用正则表达式解析XML不是一件好事,因为它太复杂了 - 做得好,你需要处理/验证标记嵌套,换行和各种基本只是使你的正则表达式的东西一段脆弱的代码。所以请不要。
但最重要的是 - 在发布查询之前始终使用strict
和warnings
。这些是您进行故障排除的第一个停靠点。
如果你这样做,你会看到以下内容:
print "file" $var2;
那根本不会起作用。还有一些其他人无法正常使用您的代码&#39;真的 - 这将是起点。
此外 - 您的XML无效 - 您的&#S4;&#39;我认为缺少引号。
无论如何,假设这只是一个错字,我从XML::Twig
开始(因为我比LibXML更了解它而不是任何具体原因)并做这样的事情:
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my %children_of;
#as we process, extract all the 'rect' elements - along with a reference to their context.
sub process_rect {
my ( $twig, $rect ) = @_;
push( @{ $children_of{ $rect->parent } }, $rect->cut );
}
my $twig = XML::Twig->new(
'pretty_print' => 'indented',
'twig_handlers' => { 'rect' => \&process_rect },
);
$twig->parse( \*DATA );
#run through all the 'bhap' elements.
foreach my $bhap ( $twig->root->children('bhap') ) {
#find the rect elements under this bhap.
foreach my $rect ( @{ $children_of{$bhap} } ) {
#create a new XML document - copy the 'root' name from your original document.
my $xml = XML::Twig::Elt->new( $twig -> root -> name );
#duplicate this 'bhap' element by copying it, rather than cutting it,
#so we can paste it more than once (e.g. per 'rect')
my $subset = $bhap->copy;
#insert the 'bhap' into our new xml.
$subset->paste( last_child => $xml );
#insert our cut rect beneath this bhap.
$rect->paste( last_child => $subset );
#print the resulting XML.
print "--\n";
$xml->print;
}
}
__DATA__
<xml>
<bhap id="1">
<label>cylind - I</label>
<title>premier</title>
<rect id="S1">
<title>Short</title>
<label>1.</label>
<p><text>welcome</text></p>
</rect>
<rect id="S2">
<title>Definite</title>
<label>2.</label>
<p><text>welcome1</text></p>
</rect>
</bhap>
<bhap id="2">
<label>cylind - II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p><text>welcome3</text></p>
</rect>
<rect id="S4">
<title>Term</title>
<label>4.</label>
<p><text>welcome4</text></p>
</rect></bhap>
</xml>
我们对XML进行了预处理,并且&#39;剪掉了&#39; rect
个节点。然后我们循环遍历每个bhap
节点 - 复制它们,并在它们下面插入相关的rect
。
这给出了输出:
--
<xml>
<bhap id="1">
<label>cylind - I</label>
<title>premier</title>
<rect id="S1">
<title>Short</title>
<label>1.</label>
<p>
<text>welcome</text>
</p>
</rect>
</bhap>
</xml>
--
<xml>
<bhap id="1">
<label>cylind - I</label>
<title>premier</title>
<rect id="S2">
<title>Definite</title>
<label>2.</label>
<p>
<text>welcome1</text>
</p>
</rect>
</bhap>
</xml>
--
<xml>
<bhap id="2">
<label>cylind - II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S3">
<title>nauty.—</title>
<label>3.</label>
<p>
<text>welcome3</text>
</p>
</rect>
</bhap>
</xml>
--
<xml>
<bhap id="2">
<label>cylind - II</label>
<title>AUTHORITIES AND ITS EMPLOYEES</title>
<rect id="S4">
<title>Term</title>
<label>4.</label>
<p>
<text>welcome4</text>
</p>
</rect>
</bhap>
</xml>
至少看起来相当接近你正在尝试制作的东西。我已经跳过阅读文件并打印出内容,因为重建XML是更难的部分。
我还建议您查看XML::Twig
提供的xml_split
,因为这可能完全符合您的要求。