如何使用Perl将来自不同目录的多个XML文件连接成一个XML文件?
答案 0 :(得分:1)
我必须做出相当多的假设才能做到这一点,但这是我的回答:
#!/usr/bin/perl -w
use strict;
use XML::LibXML;
my $output_doc = XML::LibXML->load_xml( string => <<EOF);
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects xml:id='total'/>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
</issu-meta>
EOF
my $object_count = 0;
foreach (@ARGV) {
my $input_doc = XML::LibXML->load_xml( location => $_ );
foreach ($input_doc->findnodes('/*[local-name()="issu-meta"]/*[local-name()="basictype"]')) { # find each object
my $object = $output_doc->importNode($_, 1); # import the object information into the output document
$output_doc->documentElement->appendChild($object); # append the new XML nodes to the output document root
$object_count++; # keep track of how many objects we've seen
}
}
my $total = $output_doc->getElementById('total'); # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count)); # append the object count to that element
$total->removeAttribute('xml:id'); # remove the XML id, as it's not wanted in the output
print $output_doc->toString; # output the final document
首先,<comp>
元素似乎来自哪里,所以我不得不忽略它。我还假设除了对象计数之外,每个<basictype>
元素之前所需的输出内容总是相同的。
所以我构建一个空的输出文档,然后迭代命令行上提供的每个文件名。对于每个,我找到每个对象并将其复制到输出文件中。完成所有输入文件后,我插入对象计数。
在文件上使用xmlns
会变得更加困难。这使得XPath搜索表达式比它需要的更复杂。如果可能的话,我很想删除xmlns
属性,你就会被遗忘:
foreach ($input_doc->findnodes('/issu-meta/basictype')) {
这简单得多。
所以,当我运行时:
perl combine abc/a.xml xyz/b.xml
我明白了:
<?xml version="1.0"?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects>3</num-objects>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
<basictype>
<id> 1 </id>
<name> pointer </name>
<pointer/>
<size> 64 </size>
</basictype><basictype>
<id> 4 </id>
<name> int32_t </name>
<primitive/>
<size> 32 </size>
</basictype><basictype>
<id> 2 </id>
<name> int8_t </name>
<primitive/>
<size> 8 </size>
</basictype></issu-meta>
这非常接近你所追求的目标。
编辑:好的,我的答案现在看起来像这样:
#!/usr/bin/perl -w
use strict;
use XML::LibXML qw( :libxml ); # load LibXML support and include node type definitions
my $output_doc = XML::LibXML->load_xml( string => <<EOF); # create an empty output document
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects xml:id='total'/>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
</issu-meta>
EOF
my $object_count = 0;
foreach (@ARGV) {
my $input_doc = XML::LibXML->load_xml( location => $_ );
my $import_started = 0;
foreach ($input_doc->documentElement->childNodes) {
next unless $_->nodeType == XML_ELEMENT_NODE; # if it's not an element, ignore it
if ($_->localName eq 'compatibility') { # if it's the "compatibility" element, ...
$import_started = 1; # ... switch on importing ...
next; # ... and move to the next child of the root
}
next unless $import_started; # if we've not started importing, and it's
# not the "compatibility" element, simply
# ignore it and move on
my $object = $output_doc->importNode($_, 1); # import the object information into the output document
$output_doc->documentElement->appendChild($object); # append the new XML nodes to the output document root
$object_count++; # keep track of how many objects we've seen
}
}
my $total = $output_doc->getElementById('total'); # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count)); # append the object count to that element
$total->removeAttribute('xml:id'); # remove the XML id, as it's not wanted in the output
print $output_doc->toString; # output the final document
在它找到的第一个<issu-meta>
元素之后,只导入作为根<compatibility>
文档元素的子元素的每个元素,并且像以前一样,更新对象计数。如果我理解你的要求应该做到。
如果有效,我强烈建议您完成这个答案和我之前的答案,以确保您理解为什么它适用于您的问题。这里使用了许多有用的技术,一旦你理解了它,你就会学到很多关于操作XML的方法。有任何问题,请在本网站上提出新问题。玩得开心!
编辑#2:对,这应该是你需要的最后一块:
#!/usr/bin/perl -w
use strict;
use XML::LibXML qw( :libxml ); # load LibXML support and include node type definitions
my @input_files = (
'abc/a.xml',
'xyz/b.xml',
);
my $output_file = 'output.xml';
my $output_doc = XML::LibXML->load_xml( string => <<EOF); # create an empty output document
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects xml:id='total'/>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
</issu-meta>
EOF
my $object_count = 0;
foreach (@input_files) {
my $input_doc = XML::LibXML->load_xml( location => $_ );
my $import_started = 0;
foreach ($input_doc->documentElement->childNodes) {
next unless $_->nodeType == XML_ELEMENT_NODE; # if it's not an element, ignore it
if ($_->localName eq 'compatibility') { # if it's the "compatibility" element, ...
$import_started = 1; # ... switch on importing ...
next; # ... and move to the next child of the root
}
next unless $import_started; # if we've not started importing, and it's
# not the "compatibility" element, simply
# ignore it and move on
my $object = $output_doc->importNode($_, 1); # import the object information into the output document
$output_doc->documentElement->appendChild($object); # append the new XML nodes to the output document root
$object_count++; # keep track of how many objects we've seen
}
}
my $total = $output_doc->getElementById('total'); # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count)); # append the object count to that element
$total->removeAttribute('xml:id'); # remove the XML id, as it's not wanted in the output
$output_doc->toFile($output_file, 1); # output the final document
运行完成后:perl combine
创建了文件output.xml
,其中包含以下内容:
<?xml version="1.0"?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects>7</num-objects>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
<basictype>
<id> 1 </id>
<name> pointer </name>
<pointer/>
<size> 64 </size>
</basictype><basictype>
<id> 4 </id>
<name> int32_t </name>
<primitive/>
<size> 32 </size>
</basictype><enum>
<id>1835009 </id>
<name> chkpt_state_t </name>
<label>
<name> CHKP_STATE_PENDING </name>
<value> 1 </value>
</label>
</enum><struct>
<id> 1835010 </id>
<name> _ipcEndpoint </name>
<size> 64 </size>
<elem>
<id> 0 </id>
<name> ep_addr </name>
<type> uint32_t </type>
<type-id> 8 </type-id>
<size> 32 </size>
<offset> 0 </offset>
</elem>
</struct><basictype>
<id> 2 </id>
<name> int8_t </name>
<primitive/>
<size> 8 </size>
</basictype><alias>
<id> 1835012 </id>
<name> Endpoint </name>
<size> 64 </size>
<type> _ipcEndpoint </type>
<type-id> 1835010 </type-id>
</alias><bitmask>
<id> 1835015 </id>
<name> ipc_flag_t </name>
<size> 8 </size>
<type> uint8_t </type>
<type-id> 6 </type-id>
<label>
<name> IPC_APPLICATION_REGISTER_MSG </name>
<value> 1 </value>
</label>
</bitmask></issu-meta>
最后提示:尽管它对XML几乎没有任何影响,但是一旦通过xmltidy
运行它就会更具人性化:
<?xml version="1.0"?>
<issu-meta xmlns="ver2">
<metadescription>
<num-objects>7</num-objects>
</metadescription>
<compatibility>
<baseline> 6.2.1.2.43 </baseline>
</compatibility>
<basictype>
<id> 1 </id>
<name> pointer </name>
<pointer/>
<size> 64 </size>
</basictype>
<basictype>
<id> 4 </id>
<name> int32_t </name>
<primitive/>
<size> 32 </size>
</basictype>
<enum>
<id>1835009 </id>
<name> chkpt_state_t </name>
<label>
<name> CHKP_STATE_PENDING </name>
<value> 1 </value>
</label>
</enum>
<struct>
<id> 1835010 </id>
<name> _ipcEndpoint </name>
<size> 64 </size>
<elem>
<id> 0 </id>
<name> ep_addr </name>
<type> uint32_t </type>
<type-id> 8 </type-id>
<size> 32 </size>
<offset> 0 </offset>
</elem>
</struct>
<basictype>
<id> 2 </id>
<name> int8_t </name>
<primitive/>
<size> 8 </size>
</basictype>
<alias>
<id> 1835012 </id>
<name> Endpoint </name>
<size> 64 </size>
<type> _ipcEndpoint </type>
<type-id> 1835010 </type-id>
</alias>
<bitmask>
<id> 1835015 </id>
<name> ipc_flag_t </name>
<size> 8 </size>
<type> uint8_t </type>
<type-id> 6 </type-id>
<label>
<name> IPC_APPLICATION_REGISTER_MSG </name>
<value> 1 </value>
</label>
</bitmask>
</issu-meta>
祝你好运,并进一步发展。当他们出现时,请回到这个网站询问更多问题!