我正在运行ucm2.pl脚本来扫描庞大的目录结构(目录是映射到本地的网络驱动器)。我有两个perl脚本ucm1.pl和ucm2.pl.我正在为不同的参数运行ucm2.pl parellely,它通过ucm1.pl调用。
ucm1.pl -
#!/usr/bin/perl
use strict;
use warnings;
use Parallel::ForkManager;
my $filename ="intfSplitList.txt"; #(this will have list of all the input files. eg intfSplit_0....intfSplit_50)
my $lines;
my $buffer;
open(FILE, $filename) or die "Can't open `$filename': $!";
while (<FILE>) {
$lines = $.;
}
close FILE;
print "The number of lines in $filename is $lines \n";
my $pm = Parallel::ForkManager->new($lines); #(it will set the no. of parallel processes)
open (my $fh, '<', "intfSplitList.txt") or die $!;
while (my $data = <$fh>) {
chomp $data;
my $pid = $pm->start and next;
system ("perl ucm2.pl -iinput.txt -f$data");
#(call the ucm2.pl) #(input.txt file will have search keyword and $data will have intfSplit_*.txt files)
$pm->finish; # Terminates the child process
}
ucm2.pl代码 -
#!/usr/bin/perl
use strict;
use warnings;
use File::Find;
use Getopt::Std;
#getting the input parameters
getopts('i:f:');
our($opt_i, $opt_f);
my $searchKeyword = $opt_i; #Search keyword file.
my $intfSplit = $opt_f; #split file
my $path = "Z:/aims/"; #source directory
my $searchString; #search keyword
open FH, ">log.txt"; #open the log file to write
print FH "$intfSplit ". "started at ".(localtime)."\n"; #write the log file
open (FILE,$intfSplit); #open the split file to read
while(<FILE>){
my $intf= $_; #setting the interface to intf
chomp($intf);
my $dir = $path.$intf;
chomp($dir);
print "$dir \n";
open(INP,$searchKeyword); #open the search keyword file to read
while (<INP>){
$searchString =$_; #setting the search keyword to searchString
chomp($searchString);
print "$searchString \n";
open my $out, ">", "vob$intfSplit.txt" or die $!; #open the vobintfSplit_* file to write
#calling subroutine printFile to find and print the path of element
find(\&printFile,$dir);
#the subroutine will search for the keyword and print the path if keyword is exist in file.
sub printFile {
my $element = $_;
if(-f $element && $element =~ /\.*$/){
open my $in, "<", $element or die $!;
while(<$in>) {
if (/\Q$searchString\E/) {
my $last_update_time = (stat($element))[9];
my $timestamp = localtime($last_update_time);
print $out "$File::Find::name". " $timestamp". " $searchString\n";
last;
}
}
}
}
}
}
print FH "$intfSplit ". "ended at ".(localtime)."\n"; #write the log file
一切运行正常,但单个关键字搜索的运行时间也很长。 任何人都可以建议一些更好的方法来提高性能。
提前致谢!!
答案 0 :(得分:1)
运行Perl的多个实例会增加许多不必要的开销。您是否看过my answer to your previous question,建议更改此内容?
同样如前所述,您在此处有一些不必要的重复:没有理由多次打开和处理您的搜索关键字文件。您可以创建一个打开关键字文件的子项,并将关键字放在数组中。然后将这些关键字传递给另一个进行搜索的子项。
通过一次搜索所有关键字,您可以更快地搜索多个关键字。做这样的事情来获得你的关键词:
my @keywords = map {chomp;$_} <$fh>;
my $regex = "(" . join('|', map {quotemeta} @keywords) . ")";
现在你有一个像这样的正则表达式:(\Qkeyword1\E|\Qkeyword2\E)
。您只需搜索一次文件,如果要查看匹配的关键字,只需检查$1
的内容即可。这不会加快单个关键字的速度,但搜索许多关键字几乎与搜索单个关键字一样快。
但最终,如果您在网络上搜索庞大的目录结构,可能会限制您加快速度。
更新:更正了咀嚼。谢谢amon。