我有一个输入xml,我必须根据doc和delt明智进行拆分并将其保存为这种格式delt_0001.xml
这是我的代码
#!/usr/bin/perl
use XML::XPath;
my $file = 'file.xml';
my $xp = XML::XPath->new(filename=>$file);
foreach my $entry ( $xp->findnodes('/xml/service/main/doc') ) {
my $filename = $entry->findvalue('./delt/@id');
foreach my $entry1( $entry->findnodes('//delt')){
my $filename = $entry1->findvalue('/delt/@id');
my $content = $entry1->toString;
open(wr,">delt_$filename.xml");
print wr "$content\n";
close wr;
}
当我运行程序时,所有delt部分都以一个xml打印。
输入xml delt.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id="001">
<title>doc1</title>
<delt id="0001">
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
<delt id="0002-A">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
<doc id="002">
<title>doc2</title>
<delt id="0003">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0004">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
输出结果
<delt id="0001">
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
<delt id="0002-A">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0003">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0004">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
需要输出
拆分1 delt_0001.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id=001>
<title>doc1</title>
<delt id=0001>
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
拆分2号delt_0002-A.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id=001>
<title>doc1</title>
<delt id=0002=A>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
拆分3 delt_0003.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id=002>
<title>doc2</title>
<delt id=0003>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
拆分4号delt_0004.xml
<xml>
<service>
<title>split xml</title>
<main>
<doc id=002>
<title>doc2</title>
<delt id=0004>
<title>delt1</title>
<text>num1</text>
<text>num1</text>
<delt>
</doc>
</main>
</service>
</xml>
提前致谢
答案 0 :(得分:1)
使用XML :: Twig执行此操作非常简单(我很高兴我在解析期间“删除了当前元素”以便工作一段时间):
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $delt= 'delt.xml';
XML::Twig->new( twig_handlers => { delt => \&delt },
pretty_print => 'indented',
)
->parsefile( $delt);
exit;
sub delt
{ my( $t, $delt)= @_;
my $delt_file= sprintf( 'delt_%s.xml', $delt->id);
# the only tricky part: remove previous doc if needed
if( my $prev_doc= $delt->parent( 'doc')->prev_sibling( 'doc'))
{ $prev_doc->delete; }
$t->print_to_file( $delt_file);
$delt->delete;
}
答案 1 :(得分:0)
你遇到困难的原因是因为你正在做的是从XML文档中提取一个子集,然后尝试也包含一些来自“父”的东西。
将你的'delts'拉出来会非常简单
我想用这个XML::Twig
- 这是一个使用树枝处理程序的完美场所。
我会想到某些事情(和道歉,这还不太有效)。
use strict;
use warnings;
use XML::Twig;
sub process_delt {
my ( $twig, $delt ) = @_;
my $delt_id = $delt->att('id');
print "\nID:\n$delt_id\n";
my $filename = "$delt_id.xml";
$delt->set_pretty_print('indented');
$delt->print;
print "\n--------\n";
}
my $twig = XML::Twig->new(
twig_handlers => { delt => \&process_delt },
);
local $/;
$twig->parse(<DATA>);
__DATA__
<xml>
<service>
<title>split xml</title>
<main>
<doc id="001">
<title>doc1</title>
<delt id="0001">
<title>delt1</title>
<text>num1</text>``
<text>num1</text>
</delt>
<delt id="0002-A">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
<doc id="002">
<title>doc2</title>
<delt id="0003">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
<delt id="0004">
<title>delt1</title>
<text>num1</text>
<text>num1</text>
</delt>
</doc>
</main>
</service>
</xml>
编辑:看看@ mirod的答案,因为它完全正常。这个只会提取每个'delt',然后你可能不得不搞砸找出父东西。