我有一些工作代码,我尝试使用dreamincode教程多线程:http://www.dreamincode.net/forums/topic/255487-multithreading-in-perl/
示例代码似乎工作正常,但我不能为我的生活找出原因,为什么我的不是。从放入调试消息开始,它似乎一直到所有线程的子程序结束,然后在那里停留一段时间,然后再遇到分段故障并转储核心。话虽如此,我还没有设法在任何地方找到核心转储文件(Ubuntu 13.10)。
如果有人有任何建议的阅读,或者可以在下面相当混乱的代码中看到错误,我将永远感激。
#!/usr/bin/env perl
use Email::Valid;
use LWP::Simple;
use XML::LibXML;
use Text::Trim;
use threads;
use DB_File;
use Getopt::Long;
my $sourcefile = "thislevel.csv";
my $startOffset = 0;
my $chunk = 10000;
my $num_threads = 8;
$result = GetOptions ("start=i" => \$startOffset, # numeric
"chunk=i" => \$chunk, # numeric
"file=s" => \$sourcefile, # string
"threads=i" => \$num_threads, #numeric
"verbose" => \$verbose); # flag
$tie = tie(@filedata, "DB_File", $sourcefile, O_RDWR, 0666, $DB_RECNO)
or die "Cannot open file $sourcefile: $!\n";
my $filenumlines = $tie->length;
if ($filenumlines>$startOffset + $chunk){
$numlines = $startOffset + $chunk;
} else {
$numlines = $filenumlines;
}
open (emails, '>>emails.csv');
open (errorfile, '>>errors.csv');
open (nxtlvl, '>>nextlevel.csv');
open (donefile, '>>donelines.csv');
my $line = '';
my $found = false;
my $linenum=0;
my @threads = initThreads();
foreach(@threads){
$_ = threads->create(\&do_search);
}
foreach(@threads){
$_->join();
}
close nxtlvl;
close emails;
close errorfile;
close donefile;
sub initThreads{
# An array to place our threads in
my @initThreads;
for(my $i = 1;$i<=$num_threads;$i++){
push(@initThreads,$i);
}
return @initThreads;
}
sub do_search{
my $id = threads->tid();
my $linenum=$startOffset-1+$id;
my $parser = XML::LibXML->new();
$parser->set_options({ recover => 2,
validation => 0,
suppress_errors => 1,
suppress_warnings => 1,
pedantic_parser => 0,
load_ext_dtd => 0, });
while ($linenum < $numlines) {
$found = false;
@full_line = split ',', $filedata[$linenum-1];
$line = trim(@full_line[1]);
$this_url = trim(@full_line[2]);
print "Thread $id Scanning $linenum of $filenumlines\: ";
printf "%.3f\%\n", 100 * $linenum / $filenumlines;
my $content = get trim($this_url);
if (!defined($content)) {
print errorfile "$this_url, no content\n";
}elsif (length($content)<100) {
print errorfile "$this_url, short\n";
}else {
my $doc = $parser->load_html(string => $content);
if(defined($doc)){
for my $anchor ( $doc->findnodes("//a[\@href]") )
{
$is_email = substr $anchor->getAttribute("href") ,7;
if(Email::Valid->address($is_email)) {
printf emails "%s, %s\n", $line, $is_email;
$found = true;
} else{
$link = $anchor->getAttribute("href");
if (substr lc(trim($link)),0,4 eq "http"){
printf nxtlvl "%s, %s\n", $line, $link;
} else {
printf nxtlvl "%s, %s/%s\n", $line, $line, $link;
}
}
}
}
if ($found=false){
my @lines = split '\n',$content;
foreach my $cline (@lines){
my @words = split ' ',$cline;
foreach my $word (@words) {
my @subwords = split '"',$word ;
foreach my $subword (@subwords) {
if(Email::Valid->address($subword)) {
printf emails "%s, %s\n", $line, $subword;
}
}
}
}
}
}
printf donefile "%s\n",$linenum;
$linenum = $linenum + $num_threads;
}
threads->exit();
}
答案 0 :(得分:0)
除了各种编码错误,这意味着我的代码永远不会被用作其他访问者的示例,DB_File
不是一个线程安全的模块。
令人讨厌,也许是误导性的,它完全可以正常工作,直到你关闭在整个代码中成功访问文件的线程。