Question

我刚刚进入perl世界，我有一个任务是使用perl替换文件夹中的多个xml文件，我尝试了一些perl一行代码，但它没有帮助我，我需要一个perl代码替换所选文件夹中的多个文本文件。我在stackoverflow Replace values for multiple XML files in a folder using perl下面尝试了这个帖子，但它也没有帮助我。请保持温柔，因为我是新手，我从上面的stackflow帖子中提供我尝试过的代码显示错误，请查看并建议解决方案。

return value;
valid=false;

错误

my $dir = ***D:\Perl***;
my $d = opendir();
map {
    if (
        -f "$dir/$_"
        && ($_ =~ "\.xml$")
    ) {
        open (my $input_file, '<', ) or die "unable to open $input_file $!\n";

        my $input;
        {
            local $/;               #Set record separator to undefined.
            $input = <$input_file>; #This allows the whole input file to be read at once.
        }
        close $input_file;

        $input =~ s/Comment//g;

        open (my $output_file, '>', "$dir/$_") or die "unable to open $output_file $!\n";
        print {$output_file} $input;

        close $output_file or die $!;
    }
} readdir($d);
closedir($d);

XML文件位于文件夹D：\ Perl \

中

syntax error at hello3.pl line 10, near "=~ "\.xml$""
Global symbol "$dir" requires explicit package name at hello3.pl line 23.
Global symbol "$output_file" requires explicit package name at hello3.pl line 23.
syntax error at hello3.pl line 28, near "}"
Global symbol "$d" requires explicit package name at hello3.pl line 28.
Global symbol "$d" requires explicit package name at hello3.pl line 29.
Execution of hello3.pl aborted due to compilation errors.

每个xml文件中的

代码如下所示

1.xml
2.xml
3.xml

Answer 1

作为Perl的新人，我印象深刻，你已经锁定了map。 map旨在将数组转换为哈希值 - 可以通过评估代码块来实现此目的。

然而，这非常讨厌，因为它创建了难以遵循的代码。为什么不使用for（或foreach）循环？关键警告标志是'我是否将地图结果分配给哈希（或hashref）？'如果答案是否定的，那么这可能不是一个好方法。

另外：对于这种迭代操作，我更倾向于glob而不是opendir。

但最重要的是：

不要对XML使用正则表达式和基于行的解析

请请使用XML Parser解析XML。通过正则表达式这样做是非常讨厌的 - 它会使代码变得脆弱不可靠。 XML规范中有许多内容使得语义相同的XML（因此从上游系统的角度来看是“有效的”）与正则表达式不匹配。比如一元标签，换行和分割标签。

举个例子：

<XML
><some_tag
att1="1"
att2="2"
att3="3"
></some_tag></XML>

或者：

<XML><some_tag att1="1" att2="2" att3="3"></some_tag></XML>

或者：

<XML>
  <some_tag
      att1="1"
      att2="2"
      att3="3"></some_tag>
</XML>

或者：

<XML>
  <some_tag att1="1" att2="2" att3="3"></some_tag>
</XML>

或者：

<XML>
  <some_tag att1="1" att2="2" att3="3"/>
</XML>

所有'说'基本上都是一样的（技术上，在上一个例子中'无文本'和'无文本'之间存在细微差别），但我希望你能清楚地看到 - 基于线和正则表达式的测试包含所有这些将是困难的。这就是我继续建议的原因 - 每次出现时都会使用“解析器”。

考虑到这一点 - 你可能根本不需要删除注释 - 因为它们是XML规范的一部分，并且作为解析过程的一部分处理它们要好得多。

我喜欢XML::Twig和perl。但是存在其他模块，可能是您继续使用其他模块（例如XML::LibXML）。

哦，你的XML中应该出现错误：

<?xml version="1.0"?>

无论如何，考虑到这一点 - 按照要求回答你的问题：

从某些XML中删除注释

#!/usr/local/bin/perl
use strict;
use warnings;

use XML::Twig;

foreach my $file ( glob("$dir/*.xml") ) {
    my $twig =
        XML::Twig->new( comments => 'drop', pretty_print => 'indented_a' );
    $twig->parsefile($file);
    open( my $output, ">", $file . ".new" ) or warn $!;
    print {$output} $twig->sprint;
    close($output);
}

这会将您的示例XML转换为：

<?xml version="1.0"?>
<root>
  <subtag>
    <element>This is 1.xml file</element>
  </subtag>
</root>

删除'非评论'元素

如果您想要删除评论之外的其他内容 - 请记住评论是一个特殊情况 - 而是想说，摆脱一个特定的元素：

XML::Twig->new( pretty_print => 'indented_a',
                twig_handlers => { 'element' => sub { $_ -> delete } } );

注意 - 这将删除每个元素标记 - 您可以通过xpath表达式（例如'subtag/element'）应用更多选择性条件，或使用适当的子例程来处理和解析：

sub delete_element_with_file {
    my ( $twig, $element ) = @_;
    if ( $element->text =~ m/file/ ) { $element->delete }
}


my $twig = XML::Twig->new(
    pretty_print  => 'indented_a',
    twig_handlers => { 'subtag/element' => \&delete_element_with_file }
);

##etc.

使用perl搜索和替换文件夹中的多个xml文件

1 个答案:

不要对XML使用正则表达式和基于行的解析

从某些XML中删除注释

删除'非评论'元素