Question

我的perl脚本有一些问题，困扰我好几天了。总结一下，目的是以块的形式读入一个大文件，并对输入流进行一些操作（与我的问题无关）。当我第一次实现它时，我只是在文件上循环，然后在它上面做了一些东西，比如：

while (read FILE, $buffer, $chunksize){ 
  callSomeOperation($buffer);
  # Do some other stuff
}

不幸的是，文件非常大，并且操作在某种程度上与许多函数调用复杂，因此导致内存不断增加perl无法再分配内存而脚本失败。所以我做了一些调查并尝试了几件事来最小化内存开销（在循环外定义变量，设置为undef等等），这导致分配的内存大小增加得更慢，但最后仍然失败。（如果我弄清楚了，perl会给操作系统回忆内存......这在实践中不会发生。）

所以我决定将函数调用及其所有定义嵌套在子线程中，等待它的完成，连接然后再用下一个块调用线程：

while (read FILE, $buffer, $chunksize){
my $thr = threads->create(\&thrWorker,$buffer);
$thr->join();
}

sub thrWorker{
# Do the stuff here!
}

如果线程加入，那可能是一个解决方案！但它实际上没有。如果我用$ thr-＆gt; detach（）运行它;一切都很好，除了我同时得到线程的阴影，这不是一个好主意，在这种情况下，我需要连续运行它们。

所以我对这个连接问题进行了一些调查，并得到了一些声音，这可能是perl 5.16.1的一个问题，所以我更新到5.16.2，但它仍然没有加入。邮件列表中的任何地方我都记得我从某人那里读到了设法让Threads加入CPAN模块Thread :: Queue但这对我来说也没用。

所以我放弃了线程并试图分叉这个东西。但是用叉子看起来“叉子”的总数有限吗？无论如何，直到第13次到第20次迭代它都没有用，然后放弃了它不能再分叉的信息。

my $pid = fork();
if( $pid == 0 ){
       thrWorker($buffer);
    exit 0;
}

我也尝试过使用CPAN模块Parallel :: ForkManager和Proc :: Fork但是没有帮助。

所以现在我被某种方式卡住了，无法帮助自己。也许其他人可以！任何建议都非常感谢！

如何使用线程或子进程来处理此事？
或者至少我如何强制perl释放内存以便我可以在同一个进程中执行此操作？

有关我系统的一些其他信息：操作系统：Windows 7 64位/ Ubuntu Server 12.10 Windows上的Perl：Strawberry Perl 5.16.2 64位

我在Stackoverflow上发表的第一篇文章。希望我做对了： - ）

Answer 1

我建议阅读：this

我通常使用Thread :: Queue来管理线程的输入。示例代码：

my @threads = {};
my $Q = new Thread::Queue;

# Start the threads
for (my $i=0; $i<NUM_THREADS; $i++) {
    $threads[$i] = 
        threads->new(\&insert_1_thread, $Q);
}

# Get the list of sites and put in the work queue
foreach $row ( @{$ref} ) {
    $Q->enqueue( $row->[0] );
    #sleep 1 while $Q->pending > 100;
} # foreach $row

# Signal we are done
for (my $i=0; $i<NUM_THREADS; $i++) {
    $Q->enqueue( undef ); }

$count = 0;
# Now wait for the threads to complete before going on to the next step
for (my $i=0; $i<NUM_THREADS; $i++) {
    $count += $threads[$i]->join(); }

对于工作线程：

sub insert_1_thread {
my ( $Q ) = @_;
my $tid = threads->tid;
my $count = 0;
Log("Started thread #$tid");

while( my $row = $Q->dequeue ) {
    PROCESS ME...
    $count++;
} # while

Log("Thread#$tid, done");
return $count;

} # sub insert_1_thread

Answer 2

我不知道它是否适合你，但你可以创建一个块对象数组并像这样并行处理它们：

#!/usr/bin/perl

package Object; {
    use threads;
    use threads::shared;        

    sub new(){
        my $class=shift;
        share(my %this);
        return(bless(\%this,$class));
    }

    sub set {
       my ($this,$value)=@_;    
        lock($this);
#       $this->{"data"}=shared_clone($value);
        $this->{"data"}=$value;
    }

    sub get {
        my $this=shift; 
        return $this->{"data"};
    }
}


package main; {

use strict;
use warnings;

use threads;
use threads::shared;

    my @objs;
    foreach (0..2){
        my $o = Object->new();
        $o->set($_);
        push @objs, $o; 
    }

    threads->create(\&run,(\@objs))->join();

    sub run {
        my ($obj) = @_;     
        $$obj[$_]->get() foreach(0..2);        
    }
}

加入线程的问题

2 个答案: