如何使用Twig模块从XML中删除注释

时间:2011-11-15 05:59:21

标签: xml perl comments perl-module xml-twig

我正在使用XML :: Twig模块从XML文件中删除所有注释。示例文件可以是 -

<?xml version="1.0" encoding="UTF-8"?>
<Node_A>
node A content 1
<!-- One Line Comment A1-->
<![CDATA[this portion within the two comments is being
REMOVED which is not the intention]]>
<!-- Two Line Comment
Two Line Comment-->
node A content 3
<!-- Two Line Comment
Two Line Comment-->
<![CDATA[this portion within the two comments is being
REMOVED which is not the intention]]>
<!-- Two Line Comment
Two Line Comment-->
<![CDATA[
this portion is fine]]>

<Node_B> node B content
<Node_C> node c content
</Node_C>
<!-- One Line Comment -->
some data one
<!-- Multi  Line Comment
Line 3Comment
1Line Comment
2Line Comment
Line 5Comment
Line Comment-->
some data again two 
<!-- Multi  Line Comment
Line 3Comment
Line 5Comment
Line Comment-->

few more
</Node_B>

</Node_A>

我使用了像

这样的脚本
#!/usr/bin/perl 

use strict;
use warnings;
use XML::Twig;
my $infile = 'demo.xml';
my $twig = XML::Twig->new (comments => 'drop', pretty_print => 'indented')->parsefile($infile);
$twig->print ();

此脚本正在删除两条评论中的“CDATA”部分 这不是我的意图。 输出结果为 -

<?xml version="1.0" encoding="UTF-8"?>
<Node_A>
node A content 1

<![CDATA[
this portion is fine]]><Node_B> node B content
<Node_C> node c content
</Node_C>

some data one

some data again two 


few more
</Node_B></Node_A>

我必须添加以保留所有CDATA部分和其他内容,只是为了 删除评论?

提前致谢。

1 个答案:

答案 0 :(得分:4)

当我使用您发布的demo.xml文件运行脚本时,我得到输出:

<?xml version="1.0" encoding="UTF-8"?>
<Node_A>
node A content 1

<![CDATA[this portion within the two comments is being
REMOVED which is not the intention]]>

node A content 3

<![CDATA[this portion within the two comments is being
REMOVED which is not the intention]]><![CDATA[
this portion is fine]]><Node_B> node B content
<Node_C> node c content
</Node_C>

some data one

some data again two


few more
</Node_B></Node_A>

这对我来说没问题。我怀疑你有XML::Twig(或XML::Parser的错误版本,它依赖于它)。我正在使用Perl 5.14.2,XML :: Twig 3.35和XML :: Parser 2.41。