如何连接多个XML文件?

时间:2014-09-10 14:36:04

标签: perl

如何使用Perl将来自不同目录的多个XML文件连接成一个XML文件?

1 个答案:

答案 0 :(得分:1)

我必须做出相当多的假设才能做到这一点,但这是我的回答:

#!/usr/bin/perl -w

use strict;
use XML::LibXML;

my $output_doc = XML::LibXML->load_xml( string => <<EOF);
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects xml:id='total'/>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
</issu-meta> 

EOF

my $object_count = 0;

foreach (@ARGV) {
  my $input_doc = XML::LibXML->load_xml( location => $_ );
  foreach ($input_doc->findnodes('/*[local-name()="issu-meta"]/*[local-name()="basictype"]')) {  # find each object
    my $object = $output_doc->importNode($_, 1);  # import the object information into the output document
    $output_doc->documentElement->appendChild($object);  # append the new XML nodes to the output document root
    $object_count++;  # keep track of how many objects we've seen
  }
}

my $total = $output_doc->getElementById('total');  # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count));  # append the object count to that element
$total->removeAttribute('xml:id');  # remove the XML id, as it's not wanted in the output

print $output_doc->toString;  # output the final document

首先,<comp>元素似乎来自哪里,所以我不得不忽略它。我还假设除了对象计数之外,每个<basictype>元素之前所需的输出内容总是相同的。

所以我构建一个空的输出文档,然后迭代命令行上提供的每个文件名。对于每个,我找到每个对象并将其复制到输出文件中。完成所有输入文件后,我插入对象计数。

在文件上使用xmlns会变得更加困难。这使得XPath搜索表达式比它需要的更复杂。如果可能的话,我很想删除xmlns属性,你就会被遗忘:

foreach ($input_doc->findnodes('/issu-meta/basictype')) {

这简单得多。

所以,当我运行时:

perl combine abc/a.xml xyz/b.xml

我明白了:

<?xml version="1.0"?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects>3</num-objects>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
<basictype>
       <id> 1 </id>
       <name> pointer </name>
       <pointer/>
       <size> 64 </size>
</basictype><basictype>
     <id> 4 </id>
     <name> int32_t </name>
     <primitive/>
     <size> 32 </size>
 </basictype><basictype>
      <id> 2 </id>
      <name> int8_t </name>
      <primitive/>
      <size> 8 </size>
</basictype></issu-meta>

这非常接近你所追求的目标。

编辑:好的,我的答案现在看起来像这样:

#!/usr/bin/perl -w

use strict;
use XML::LibXML qw( :libxml );  # load LibXML support and include node type definitions

my $output_doc = XML::LibXML->load_xml( string => <<EOF);  # create an empty output document
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects xml:id='total'/>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
</issu-meta> 

EOF

my $object_count = 0;

foreach (@ARGV) {
  my $input_doc = XML::LibXML->load_xml( location => $_ );

  my $import_started = 0;
  foreach ($input_doc->documentElement->childNodes) {
    next unless $_->nodeType == XML_ELEMENT_NODE;  # if it's not an element, ignore it

    if ($_->localName eq 'compatibility') {  # if it's the "compatibility" element, ...
      $import_started = 1;  # ... switch on importing ...
      next;  # ... and move to the next child of the root
    }

    next unless $import_started;  # if we've not started importing, and it's
                                  #   not the "compatibility" element, simply
                                  #   ignore it and move on

    my $object = $output_doc->importNode($_, 1);  # import the object information into the output document
    $output_doc->documentElement->appendChild($object);  # append the new XML nodes to the output document root
    $object_count++;  # keep track of how many objects we've seen
  }
}

my $total = $output_doc->getElementById('total');  # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count));  # append the object count to that element
$total->removeAttribute('xml:id');  # remove the XML id, as it's not wanted in the output

print $output_doc->toString;  # output the final document

在它找到的第一个<issu-meta>元素之后,只导入作为根<compatibility>文档元素的子元素的每个元素,并且像以前一样,更新对象计数。如果我理解你的要求应该做到。

如果有效,我强烈建议您完成这个答案和我之前的答案,以确保您理解为什么它适用于您的问题。这里使用了许多有用的技术,一旦你理解了它,你就会学到很多关于操作XML的方法。有任何问题,请在本网站上提出新问题。玩得开心!

编辑#2:对,这应该是你需要的最后一块:

#!/usr/bin/perl -w

use strict;
use XML::LibXML qw( :libxml );  # load LibXML support and include node type definitions

my @input_files = (
                    'abc/a.xml',
                    'xyz/b.xml',
                  );
my $output_file = 'output.xml';

my $output_doc = XML::LibXML->load_xml( string => <<EOF);  # create an empty output document
<?xml version="1.0" ?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects xml:id='total'/>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
</issu-meta> 

EOF

my $object_count = 0;

foreach (@input_files) {
  my $input_doc = XML::LibXML->load_xml( location => $_ );

  my $import_started = 0;
  foreach ($input_doc->documentElement->childNodes) {
    next unless $_->nodeType == XML_ELEMENT_NODE;  # if it's not an element, ignore it

    if ($_->localName eq 'compatibility') {  # if it's the "compatibility" element, ...
      $import_started = 1;  # ... switch on importing ...
      next;  # ... and move to the next child of the root
    }

    next unless $import_started;  # if we've not started importing, and it's
                                  #   not the "compatibility" element, simply
                                  #   ignore it and move on

    my $object = $output_doc->importNode($_, 1);  # import the object information into the output document
    $output_doc->documentElement->appendChild($object);  # append the new XML nodes to the output document root
    $object_count++;  # keep track of how many objects we've seen
  }
}

my $total = $output_doc->getElementById('total');  # find the element which will contain the object count
$total->appendChild($output_doc->createTextNode($object_count));  # append the object count to that element
$total->removeAttribute('xml:id');  # remove the XML id, as it's not wanted in the output

$output_doc->toFile($output_file, 1);  # output the final document

运行完成后:perl combine创建了文件output.xml,其中包含以下内容:

<?xml version="1.0"?>
<issu-meta xmlns="ver2">
 <metadescription>
       <num-objects>7</num-objects>
 </metadescription>
 <compatibility>
      <baseline> 6.2.1.2.43 </baseline>
 </compatibility>
<basictype>
       <id> 1 </id>
       <name> pointer </name>
       <pointer/>
       <size> 64 </size>
</basictype><basictype>
     <id> 4 </id>
     <name> int32_t </name>
     <primitive/>
     <size> 32 </size>
 </basictype><enum>
      <id>1835009 </id>
      <name> chkpt_state_t </name>
      <label>
           <name> CHKP_STATE_PENDING </name>
      <value> 1 </value>
      </label>
  </enum><struct>
         <id> 1835010 </id>
          <name> _ipcEndpoint </name>
          <size> 64 </size>
          <elem>
              <id> 0 </id>
              <name> ep_addr </name>
              <type> uint32_t </type>
              <type-id> 8 </type-id>
              <size> 32 </size>
             <offset> 0 </offset>
         </elem>
   </struct><basictype>
      <id> 2 </id>
      <name> int8_t </name>
      <primitive/>
      <size> 8 </size>
</basictype><alias>
     <id> 1835012 </id>
     <name> Endpoint </name>
     <size> 64 </size>
     <type> _ipcEndpoint </type>
     <type-id> 1835010 </type-id>
</alias><bitmask>
      <id> 1835015 </id>
      <name> ipc_flag_t </name>
      <size> 8 </size>
      <type> uint8_t </type>
      <type-id> 6 </type-id>
      <label>
           <name> IPC_APPLICATION_REGISTER_MSG </name>
           <value> 1 </value>
      </label>
 </bitmask></issu-meta>

最后提示:尽管它对XML几乎没有任何影响,但是一旦通过xmltidy运行它就会更具人性化:

<?xml version="1.0"?>
<issu-meta xmlns="ver2">
  <metadescription>
    <num-objects>7</num-objects>
  </metadescription>
  <compatibility>
    <baseline> 6.2.1.2.43 </baseline>
  </compatibility>
  <basictype>
    <id> 1 </id>
    <name> pointer </name>
    <pointer/>
    <size> 64 </size>
  </basictype>
  <basictype>
    <id> 4 </id>
    <name> int32_t </name>
    <primitive/>
    <size> 32 </size>
  </basictype>
  <enum>
    <id>1835009 </id>
    <name> chkpt_state_t </name>
    <label>
      <name> CHKP_STATE_PENDING </name>
      <value> 1 </value>
    </label>
  </enum>
  <struct>
    <id> 1835010 </id>
    <name> _ipcEndpoint </name>
    <size> 64 </size>
    <elem>
      <id> 0 </id>
      <name> ep_addr </name>
      <type> uint32_t </type>
      <type-id> 8 </type-id>
      <size> 32 </size>
      <offset> 0 </offset>
    </elem>
  </struct>
  <basictype>
    <id> 2 </id>
    <name> int8_t </name>
    <primitive/>
    <size> 8 </size>
  </basictype>
  <alias>
    <id> 1835012 </id>
    <name> Endpoint </name>
    <size> 64 </size>
    <type> _ipcEndpoint </type>
    <type-id> 1835010 </type-id>
  </alias>
  <bitmask>
    <id> 1835015 </id>
    <name> ipc_flag_t </name>
    <size> 8 </size>
    <type> uint8_t </type>
    <type-id> 6 </type-id>
    <label>
      <name> IPC_APPLICATION_REGISTER_MSG </name>
      <value> 1 </value>
    </label>
  </bitmask>
</issu-meta>

祝你好运,并进一步发展。当他们出现时,请回到这个网站询问更多问题!