合并线程上的分段错误(Perl)

时间:2014-01-01 15:01:11

标签: multithreading perl segmentation-fault

我有一些工作代码,我尝试使用dreamincode教程多线程:http://www.dreamincode.net/forums/topic/255487-multithreading-in-perl/

示例代码似乎工作正常,但我不能为我的生活找出原因,为什么我的不是。从放入调试消息开始,它似乎一直到所有线程的子程序结束,然后在那里停留一段时间,然后再遇到分段故障并转储核心。话虽如此,我还没有设法在任何地方找到核心转储文件(Ubuntu 13.10)。

如果有人有任何建议的阅读,或者可以在下面相当混乱的代码中看到错误,我将永远感激。

#!/usr/bin/env perl

use Email::Valid;
use LWP::Simple;
use XML::LibXML;
use Text::Trim;
use threads;
use DB_File;

use Getopt::Long;

my $sourcefile   = "thislevel.csv";
my $startOffset = 0;
my $chunk = 10000;
my $num_threads = 8;

$result = GetOptions ("start=i" => \$startOffset,    # numeric
              "chunk=i" => \$chunk,    # numeric
                  "file=s"   => \$sourcefile,      # string
                  "threads=i" => \$num_threads,     #numeric
                  "verbose"  => \$verbose);  # flag


$tie = tie(@filedata, "DB_File", $sourcefile, O_RDWR, 0666, $DB_RECNO)
    or die "Cannot open file $sourcefile: $!\n";

my $filenumlines = $tie->length;

if ($filenumlines>$startOffset + $chunk){
    $numlines = $startOffset + $chunk;
} else {
    $numlines = $filenumlines;
}


open (emails, '>>emails.csv');
open (errorfile, '>>errors.csv');
open (nxtlvl, '>>nextlevel.csv');
open (donefile, '>>donelines.csv');
my $line = '';
my $found = false;

my $linenum=0;

my @threads = initThreads();



foreach(@threads){

    $_ = threads->create(\&do_search);

}


foreach(@threads){
    $_->join();
}


close nxtlvl;
close emails;
close errorfile;
close donefile;


sub initThreads{
    # An array to place our threads in
    my @initThreads;
    for(my $i = 1;$i<=$num_threads;$i++){
        push(@initThreads,$i);
    }
    return @initThreads;
}




sub do_search{
    my $id = threads->tid();

    my $linenum=$startOffset-1+$id;

    my $parser = XML::LibXML->new();
    $parser->set_options({ recover           => 2,
                           validation        => 0,
                       suppress_errors   => 1,
                       suppress_warnings => 1,
                       pedantic_parser   => 0,
                       load_ext_dtd      => 0, });


    while ($linenum < $numlines) {

        $found = false;
        @full_line = split ',', $filedata[$linenum-1];

        $line = trim(@full_line[1]);
        $this_url = trim(@full_line[2]);
        print "Thread $id Scanning $linenum of $filenumlines\: ";
        printf "%.3f\%\n", 100 * $linenum / $filenumlines;

        my $content = get trim($this_url);

        if (!defined($content)) {

            print errorfile "$this_url, no content\n";

        }elsif (length($content)<100) {

            print errorfile "$this_url, short\n";

        }else {

            my $doc = $parser->load_html(string => $content);

            if(defined($doc)){

                for my $anchor ( $doc->findnodes("//a[\@href]") )
                {
                    $is_email = substr $anchor->getAttribute("href") ,7;
                    if(Email::Valid->address($is_email)) {
                        printf emails "%s, %s\n", $line, $is_email;
                        $found = true;
                    } else{
                        $link = $anchor->getAttribute("href");
                        if (substr lc(trim($link)),0,4 eq "http"){
                            printf nxtlvl "%s, %s\n", $line, $link;
                        } else {
                            printf nxtlvl "%s, %s/%s\n", $line, $line, $link;
                        }
                    }
                } 
            }
            if ($found=false){

                my @lines = split '\n',$content;

                foreach my $cline (@lines){
                    my @words = split ' ',$cline;
                        foreach my $word (@words) { 
                        my @subwords = split '"',$word ;
                        foreach my $subword (@subwords) {

                            if(Email::Valid->address($subword)) {
                                    printf emails "%s, %s\n", $line, $subword;  
                            }
                        }
                    }
                    }
            }
        }
        printf donefile "%s\n",$linenum;
        $linenum = $linenum + $num_threads;     
    }
    threads->exit();
}

1 个答案:

答案 0 :(得分:0)

除了各种编码错误,这意味着我的代码永远不会被用作其他访问者的示例,DB_File不是一个线程安全的模块。

令人讨厌,也许是误导性的,它完全可以正常工作,直到你关闭在整个代码中成功访问文件的线程。