如何提高perl脚本的性能?

时间:2013-02-01 10:03:40

标签: performance perl parallel-processing

我正在运行ucm2.pl脚本来扫描庞大的目录结构(目录是映射到本地的网络驱动器)。我有两个perl脚本ucm1.pl和ucm2.pl.我正在为不同的参数运行ucm2.pl parellely,它通过ucm1.pl调用。

ucm1.pl -

    #!/usr/bin/perl
    use strict; 
    use warnings;
    use Parallel::ForkManager;

    my $filename ="intfSplitList.txt"; #(this will have list of all the input files. eg intfSplit_0....intfSplit_50)
     my $lines;
     my $buffer;
        open(FILE, $filename) or die "Can't open `$filename': $!";
        while (<FILE>) {
            $lines = $.;
        }
        close FILE;
    print "The number of lines in $filename is $lines \n";


    my $pm = Parallel::ForkManager->new($lines); #(it will set the no. of parallel processes)

    open (my $fh, '<', "intfSplitList.txt") or die $!;
    while (my $data = <$fh>) {
      chomp $data;

      my $pid = $pm->start and next;

     system ("perl ucm2.pl -iinput.txt -f$data");  
#(call the ucm2.pl) #(input.txt file will have search keyword and $data will have intfSplit_*.txt files)

      $pm->finish; # Terminates the child process
    }

ucm2.pl代码 -

#!/usr/bin/perl
use strict;
use warnings;  
use File::Find;
use Getopt::Std;
#getting the input parameters
getopts('i:f:');

our($opt_i, $opt_f);
my $searchKeyword     = $opt_i;                               #Search keyword file.
my $intfSplit         = $opt_f;                               #split file
my $path              = "Z:/aims/";                           #source directory
my $searchString;                                             #search keyword

open FH, ">log.txt";                                          #open the log file to write

print FH "$intfSplit ". "started at ".(localtime)."\n";       #write the log file

open (FILE,$intfSplit);                                       #open the split file to read

while(<FILE>){

   my $intf= $_;                                             #setting the interface to intf
   chomp($intf);
   my $dir = $path.$intf;
   chomp($dir);
   print "$dir \n";                                              
   open(INP,$searchKeyword);                         #open the search keyword file to read

   while (<INP>){      

   $searchString =$_;                           #setting the search keyword to searchString
   chomp($searchString);
   print "$searchString \n";
   open my $out, ">", "vob$intfSplit.txt" or die $!; #open the vobintfSplit_* file to write

#calling subroutine printFile to find and print the path of element
find(\&printFile,$dir);                                       

#the subroutine will search for the keyword and print the path if keyword is exist in file.
sub printFile {
   my $element = $_;

   if(-f $element && $element =~ /\.*$/){ 

      open my $in, "<", $element or die $!;
      while(<$in>) {
         if (/\Q$searchString\E/) {
            my $last_update_time = (stat($element))[9];
            my $timestamp  = localtime($last_update_time);
            print $out "$File::Find::name". "     $timestamp". "     $searchString\n";
            last;
          }
        }
      }
    }
  }
}
print FH "$intfSplit ". "ended at ".(localtime)."\n";         #write the log file

一切运行正常,但单个关键字搜索的运行时间也很长。 任何人都可以建议一些更好的方法来提高性能。

提前致谢!!

1 个答案:

答案 0 :(得分:1)

运行Perl的多个实例会增加许多不必要的开销。您是否看过my answer to your previous question,建议更改此内容?

同样如前所述,您在此处有一些不必要的重复:没有理由多次打开和处理您的搜索关键字文件。您可以创建一个打开关键字文件的子项,并将关键字放在数组中。然后将这些关键字传递给另一个进行搜索的子项。

通过一次搜索所有关键字,您可以更快地搜索多个关键字。做这样的事情来获得你的关键词:

my @keywords = map {chomp;$_} <$fh>;
my $regex = "(" . join('|', map {quotemeta} @keywords) . ")";

现在你有一个像这样的正则表达式:(\Qkeyword1\E|\Qkeyword2\E)。您只需搜索一次文件,如果要查看匹配的关键字,只需检查$1的内容即可。这不会加快单个关键字的速度,但搜索许多关键字几乎与搜索单个关键字一样快。

但最终,如果您在网络上搜索庞大的目录结构,可能会限制您加快速度。

更新:更正了咀嚼。谢谢amon。