Question

说明如何使用perl

解析此特定XML

一点背景：我正在编写一个perl脚本，我将XML（datamod）分成两个XML文件。

示例：现有XML

<Root>
 <Top>
  <Module name="ALU">
   <input name="po_ctrl"/>
   <bidirection name="add"/>
  </Module>
  <Module name="Po_ctrl">
   <input name="ctrl"/>
   <output name="ctrlbus"/>
   <bidirection name="add"/>
  </Module>
  <input name="add"/>
  <input name="clk"/>
  <input name="da_in"/>
  <output name="da_out"/>
  <bidirection name="ctrl"/>
 </Root>
</Top>

以下是编写的perl代码段

 open(IN_FILE, "<datamod.xml") or die "Cant open input file";
 open(TM1_FILE, ">tm1.xml") or die "Cant Open tm1.xml";
 open(TM2_FILE, ">tm2.xml") or die "Cant Open tm2.xml"; 
 my $chk = 0;
 while(my $line = <IN_FILE>){
 $line =~ s/^\s+//;
 @xwords = split(" ",$line);
 if($xwords[0] ne "<Module" and $xwords[0] ne "</Module>"  and $chk ==0) {
   print TM1_FILE $line;
  }  
  else {
   print TM2_FILE $line;
   $chk = 1;
  }   
 if($xwords[0] eq "</Module>" and $chk == 1) {
  $chk = 0;
 }  
}
close TM1_FILE;
close TM2_FILE;

预期输出为两个临时文件

临时文件1：

   <Root>
      <Top>
       <input name="add"/>
       <input name="clk"/>
       <input name="da_in"/>
       <output name="da_out"/>
       <bidirection name="ctrl"/>
      </Top>
    </Root>

临时文件2

<Root>
 <Top>
  <Module name="ALU">
   <input name="po_ctrl"/>
   <bidirection name="add"/>
  </Module>
  <Module name="Po_ctrl">
   <input name="ctrl"/>
   <output name="ctrlbus"/>
   <bidirection name="add"/>
  </Module>
</Root>
</Top>

注意：我正在使用XML::Simple模块，因为Perl脚本已编写在其中，转换为任何其他XML模块非常繁琐。

感谢任何帮助，请发布重写的片段！

Answer 1

由于您还没有包含任何代码，或者您的数据目前是什么，我将建议这个简单的黑客攻击。只需在解析XML之前将其添加为文本。

use strict;
use warnings;

my $xml = <your xml here>;
$xml = "<Root>\n" . $xml . "</Root>\n";

Answer 2

Don't use regular expressions for XML. XML是一种递归数据结构，虽然您可以在技术上使用正则表达式进行递归，但它会导致代码变脏。所以实际上你最终会得到一些非常选择性的hackery，有一天会神秘地破解，因为完全有效的XML更改不再适合你的正则表达式。

另外：Don't use XML::Simple出于同样的原因。（尽管你说你在你的问题中使用它，但没有迹象表明你在你发布的代码中这样做了）。

使用正确的解析器，您要做的事情变得非常简单。我喜欢XML::Twig，XML::LibXML可能更好，但学习曲线更陡峭。要么不太容易受到未来的痛苦和伪劣的代码。

您尝试做的似乎是拆分XML，并将modules放在一个，以及＆＃34;其他所有内容＆＃34;在另一个。这是在XML::Twig中完成的，如下所示：

#!/usr/bin/env perl
use strict;
use warnings;

use XML::Twig;

#parse your input
my $twig = XML::Twig->new->parsefile( 'datamod.xml' ); 

#create a new 'modules' document. 
my $modules = XML::Twig->new;
#create a root
$modules->set_root( XML::Twig::Elt->new('Root') );
#create a "Top" element. (You can compound this if you want)
my $top = $modules->root->insert_new_elt('Top');
#set output format (note - this can break in specific edge cases - your XML
#doesn't seem to be one of those). 
$modules->set_pretty_print('indented_a');

#find all the "<Module>" elements. 
foreach my $module ( $twig->findnodes('//Module') ) {
    #cut from old doc
    $module->cut;
    #paste into new. last_child ensures same ordering.
    $module->paste( 'last_child', $top );
}

#print the output to a file.  
open ( my $output, '>', 'tm1.xml' ) or warn $!; 
print {$output} $twig -> sprint; 
close ( $output ); 

open ( my $second_output, '>', 'tm2.xml' ) or warn $!;
print {$second_output} $modules -> sprint; 
close ( $second_output );

注意 - 这里有更多关于组装新XML文档的内容：Assembling XML in Perl

您可能需要考虑设置编码和版本。

如何对不一致的XML进行分区 - Perl

示例：现有XML

2 个答案: