我的XML文件包含如下批量。
我想基于使用shell脚本的标记将此文件拆分为5个文件。 请提前帮助,谢谢。
<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
<Item>
<Title>Title 2</Title>
<DueDate>01-02-2009</DueDate>
</Item>
<Item>
<Title>Title 3</Title>
<DueDate>01-02-2010</DueDate>
</Item>
<Item>
<Title>Title 4</Title>
<DueDate>01-02-2011</DueDate>
</Item>
<Item>
<Title>Title 5</Title>
<DueDate>01-02-2012</DueDate>
</Item>
</Items>
期望的输出:
<Items>
<Item>
<Title>Title 1</Title>
<DueDate>01-02-2008</DueDate>
</Item>
</Items>
答案 0 :(得分:1)
I would suggest - install XML::Twig
which includes the rather handy xml_split
utility. That may do what you need. E.g.:
xml_split -c Item
However I'd offer what you're trying to accomplish isn't amazingly easy, because you're trying to cut up and retain the XML structure. You can't do it with standard line/regex based tools.
However you can use a parser:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::Twig;
my @item_list;
sub cut_item {
my ( $twig, $item ) = @_;
my $thing = $item->cut;
push( @item_list, $thing );
}
my $twig = XML::Twig->new(
twig_handlers => { 'Item' => \&cut_item }
);
$twig->parse(<>);
my $itemcount = 1;
foreach my $element (@item_list) {
my $newdoc = XML::Twig->new( 'pretty_print' => 'indented_a' );
$newdoc->set_root( XML::Twig::Elt->new('Items') );
$element->paste( $newdoc->root );
$newdoc->print;
open( my $output, ">", "items_" . $itemcount++ . ".xml" );
print {$output} $newdoc->sprint;
close($output);
}
This uses the XML::Twig
library to extract each of the Item
elements from your XML (piped on STDIN, or via myscript.pl yourfilename
).
It then iterates all the ones it found, adds an Items
header, and prints it to a separate file. This approach might take a little more fiddling if you had a more complex root, but it is adaptable if you do.