Question

如何使用perl基于文件行中的某些模式将非常大的文件拆分为多个小文件。

实施例。档案：

CONECT  592  593  594                                                           
CONECT  595  596  597                                                           
CONECT  597  598                                                                
END                
CONECT  591  593  594                                                           
CONECT  595  596  596                                                           
CONECT  597  598                                                                
END
CONECT  592  593  594                                                           
CONECT  594  596  598                                                           
CONECT  597  598                                                                
END

我必须在单个文件中创建许多单独的文件。输出文件起始行应为“CONECT”和“结束行应为“END”。它是一个大文件（1gb）

Answer 1

更简洁的版本，更现代的perl用法（三个参数打开，带有词法文件句柄，错误检查调用open）

#!/usr/bin/perl

use strict;
use warnings;

my $in_file  = 'file_2b_read.txt';
my $out_file = 'newfile_2b_part_%06d.txt'; # Template for output filenames
my $counter  = 1;

open my $in_fh , '<' , $in_file or die $!;
open my $out_fh , '>' , sprintf( $out_file , $counter ) or die $!;

while( <$in_fh> ) {
  print $out_fh $_;

  if( /^END/ ) {
    close( $out_fh ) ;
    open $out_fh , '>' , sprintf( $out_file , ++$counter ) or die $!;
  }
}

# cleanup afterwards
close $out_fh ;
close $in_fh ;

Answer 2

根据dgw的答案进行修改，以便不会创建虚假的最终文件：

#!/usr/bin/perl

use strict;
use warnings;

my $in_file = 'file_2b_read.txt';
my $out_file_template = 'newfile_2b_part_%06d.txt';
my $counter = 1;

open my $in_fh , '<' , $in_file or die $!;
my $out_fh;

while ( <$in_fh> ) {
    if (!$out_fh) {
        open $out_fh , '>' , sprintf( $out_file_template, $counter++ ) or die $!;
    }
    print $out_fh $_;

    if ( /^END/ ) {
        close( $out_fh );
        $out_fh = undef;
    }
}

# cleanup afterwards
if ($out_fh) { close( $out_fh ) }
close $in_fh;

Answer 3

这是一个小算法，你可以试试。如果您需要任何明确的代码，请告诉我。

while (<FD>)
{
   if ($_ =~ /^END/)
   {
      # save buffer in new file.
      # reset buffer.
   }
   # add line to buffer.
}

Answer 4

#!/usr/bin/perl
use strict;
my $file1='file_2b_read.txt';
my $File2='newfile_2b_created.txt';
open(CMD,  "<$file1") or die "$!";
open OUTPUT, ">$File2";
my  $cnt=1;
while(<CMD>) {

    print OUTPUT $_;    

    /^END/ and do {
        #create new file 
        $cnt++;
        close(OUTPUT);
        $File2='newfile_2b_created'.$cnt.'.txt';
        open OUTPUT, ">$File2";
        next;
    };
}
close(CMD);

希望这会对你有所帮助

用于文件解析的perl代码

4 个答案: