帮助将perl代码例程合并在一起进行文件处理

时间:2011-02-19 17:01:39

标签: perl foreach while-loop fileparsing

我需要一些perl帮助来使这些(2)进程/代码一起工作。我能够让他们单独工作进行测试,但我需要帮助将它们组合在一起,特别是使用循环结构。我不确定我是否应该使用foreach ..但是代码在下面。

此外,任何最佳实践都会很好,因为我正在学习这门语言。谢谢你的帮助。

以下是我正在寻找的流程:

  • 阅读目录
  • 寻找特定文件
  • 使用文件名去除一些关键信息以创建新处理的文件
  • 处理输入文件
  • 为每个读取的输入文件创建新处理的文件(如果我在10中读取,则创建10个新文件)

第1部分:

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    next if ($file =~ /^\.+$/);
    #Get filename attributes
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
      print "$1\n";
      print "$2\n";
      print "$3\n";
    }
    print "$file\n";
}

第2部分:

use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my @heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
    my $digest = md5_hex($data);
    chomp;
    my (@values) = split /,/;
    my $extra = "__mykey__$sep1$digest$sep2" ;
    $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@values));
    $data .= "$extra$eorec"; 
    print NEWFILE "$data";
}
#print $data;
close (NEWFILE);

2 个答案:

答案 0 :(得分:3)

您正在使用旧式的Perl编程。我建议你使用函数和CPAN模块(http://search.cpan.org)。 Perl伪代码:

use Modern::Perl;
# use... 

sub get_input_files {
  # return an array of files (@)
}

sub extract_file_info {
   # takes the file name and returs an array of values (filename attrs)
}

sub process_file {
   # reads the input file, takes the previous attribs and build the output file
}


my @ifiles = get_input_files;
foreach my $ifile(@ifiles) {
   my @attrs = extract_file_info($ifile);
   process_file($ifile, @attrs);
}

希望有所帮助

答案 1 :(得分:1)

我把你的两个代码片段混在一起(第二个是第一个调用每个匹配文件的sub),如果我理解你对目标的描述,那么这应该做你想要的。样式和语法的注释是内联的:

#!/usr/bin/env perl

# - Never forget these!
use strict;
use warnings;

use Digest::MD5 qw(md5_hex);

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    # Parens on postfix "if" are optional; I prefer to omit them
    next if $file =~ /^\.+$/;
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
        process_file($file, $1, $2, $3);
    }
    print "$file\n";
}

sub process_file {
    my ($orig_name, $foo_x, $name_x, $p_x) = @_;

    my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";

    # - From your description of the task, it sounds like we actually want to
    #   read from the found file, not from <>, so opening it here to read
    # - Better to use lexical ("my") filehandle and three-arg form of open
    # - "or" has lower operator precedence than "||", so less chance of
    #   things being grouped in the wrong order (though either works here)
    # - Including $! in the error will tell why the file open failed
    open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
    open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";

    my $data  = '';
    my $line1 = <$in_fh>;
    chomp $line1;
    my @heading = split /,/, $line1;
    my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
    while (<$in_fh>) {
        chomp;
        my $digest   = md5_hex($data);
        my (@values) = split /,/;
        my $extra    = "__mykey__$sep1$digest$sep2";
        $extra .= "$heading[$_]$sep1$values[$_]$sep2"
          for (0 .. scalar(@values));
        # - Useless use of double quotes removed on next two lines
        $data .= $extra . $eorec;
        #print $out_fh $data;
    }
    # - Moved print to output file to here (where it will print the complete
    #   output all at once) rather than within the loop (where it will print
    #   all previous lines each time a new line is read in) to prevent
    #   duplicate output records.  This could also be achieved by printing
    #   $extra inside the loop.  Printing $data at the end will be slightly
    #   faster, but requires more memory; printing $extra within the loop and
    #   getting rid of $data entirely would require less memory, so that may
    #   be the better option if you find yourself needing to read huge input
    #   files.
    print $out_fh $data;

    # - $in_fh and $out_fh will be closed automatically when it goes out of
    #   scope at the end of the block/sub, so there's no real point to
    #   explicitly closing it unless you're going to check whether the close
    #   succeeded or failed (which can happen in odd cases usually involving
    #   full or failing disks when writing; I'm not aware of any way that
    #   closing a file open for reading can fail, so that's just being left
    #   implicit)
    close $out_fh or die "Failed to close file: $!";
}

免责声明:perl -c报告此代码在语法上有效,但未经测试。