Question

我正在使用perl。我有很多文件名为???? _ header.xml（例如0001_header.xml，0002_header.xml等）和许多名为???? _ text.xml的文件（例如0001_text.xml ,. ..）。所有这些文件都保存在名为“input”的文件夹中。

另一方面，我有一个名为“output”的文件夹。我需要从两种输入（标题和文本）中编辑一些数据，并将版本保存在“.txt”文件中。但我需要为每对配对一个“.txt”文件（header.xml加上text.xml）。例如，我需要读取文件0001_header.xml，执行某些版本，读取文件0001_text.xml，执行某些版本，将所有版本打印在“.txt文件”中并将其保存在输出文件夹中。等等...

换句话说，我需要同时使用两个输入并将结果打印在不同的第三个文件中。

我试过这个：

use strict;
use warnings;

opendir IN, 'input';
my @lines1 = grep { /header.xml$/ } readdir IN;
closedir IN;

opendir IN, 'input';
my @lines2 = grep { /text.xml$/ } readdir IN;
closedir IN;

for my $lines1 (@lines1) {
open IN, '<', "input/$lines1" || next;
open OUT, '>', "output/$lines1" || die "can't open file output/$lines1";
while(<IN>) {
#to do several modifications
}
close IN;
}

for my $lines2 (@lines2) {
open IN, '<', "input/$lines2" || next;
open OUT, '>', "output/$lines2" || die "can't open file output/$lines2";
while(<IN>) {
#to do several modifications
print OUT;
}
close OUT;
close IN;
}

我的问题是我不知道如何管理我的输出。也就是说，如何在.txt文件输出中保存输入的修改。有什么建议吗？

Answer 1

这样的事情应该让它更清晰。

读取input目录，每当找到头文件时，都会构造相应的文本文件名（具有相同的编号）。如果这也存在，则在output目录中打开一个输出文件，其编号与输入文件的编号相同，并且.txt扩展名，并打开并处理两个输入文件。

请注意，所有这一切都是将标题和文本文件内容复制到输出文件中。您需要在print之前对数据执行更多操作。

use strict;
use warnings;
use autodie;

opendir my $dh, 'input';

while (my $file = readdir $dh) {
  next unless $file =~ /\A(\d{4})_header.xml\z/;

  my $header_name = $file;
  my $text_name = "$1_text.xml";
  next unless -f "input/$text_name";

  open my $output, '>', "output/$1.txt";

  open my $hdr_in, '<', "input/$header_name";
  while (<$hdr_in>) {
    print $output $_;
  }
  close $hdr_in;

  open my $txt_in, '<', "input/$text_name";
  while (<$txt_in>) {
    print $output $_;
  }
  close $txt_in;

  close $output;

  warn "Output file 'output/$text_name' written\n";
}

Answer 2

这是我认为保持优雅的解决方案。如果我没有完全回答你的问题，请告诉我。

我将文件名的前导数字作为键放入哈希，并将两个文件放入该哈希元素中。这样可以保持两个相关文件的配对。

#!/usr/bin/perl

use strict;
use warnings;

my $dir = "./"; #or your directory

opendir(my $dh, $dir) or die "$!";

my %files;
while(my $file = readdir($dh)){
    if(my ($key) = $file =~ /^(.+?)_(header|text)\.xml$/){
            push @{ $files{$key} }, $file;
    }
}

for my $key (keys %files){
    for(@{ $files{$key} }){
        if(/header\.xml$/) { short_open($_, \&dostuff_header) }
        elsif(/text\.xml$/) { short_open($_, \&dostuff_text) }
    }
}

sub short_open {
    my ($filename, $sub) = @_;
    open my $fh, '<', $dir . $filename or die "$!";
    open my $out, '>', $dir . $filename . "out.xml" or die "$!"; #replace $dir and remove out.xml, I used them for my own testing

    while (my $line = <$fh>){
        print $out $sub->($line);
    } 
}

sub dostuff_text {
    my ($text) = @_;
    #do stuff with text lines
    return $text;
}

sub dostuff_header {
    my ($header) = @_;
    #do stuff with header lines
    return $header;
}

仅使用1个循环获取输入数据，使用1个循环来输出数据。

我将实际工作转移到两个函数中，因此基本上每个行为都相似，行open只输入一次。

Answer 3

我不确定您对文件执行的操作类型，但由于它们是xml，请考虑使用XML :: Fast等xml解析器读取它们。这将为您提供xml内容的哈希值（或两个哈希值，每个文件一个）。然后，您可以对两个哈希执行任何类型的操作，然后将它们组合在一起，只需将哈希值转换回xml并将其打印到单个文件中。

Perl的。使用目录中的两个输入

3 个答案: