Question

我正在研究一个在Perl中实现的项目，并认为使用线程来分配工作是一个想法，因为任务可以彼此独立完成，只能从内存中的共享数据中读取。然而，表现远不如我预期的那样。所以在经过一些调查之后，我只能得出结论，Perl中的线程基本上很糟糕，但我一直想知道一旦实现一个共享变量，性能就会消耗殆尽。

例如，这个小程序没有任何共享，占用了75％的CPU（如预期的那样）：

use threads;

sub fib {
  my ( $n ) = @_;
  if ( $n < 2 ) {
     return $n;
  } else {
     return fib( $n - 1 ) + fib( $n - 2 );
  }
}

my $thr1 = threads->create( 'fib', 35 );
my $thr2 = threads->create( 'fib', 35 );
my $thr3 = threads->create( 'fib', 35 );

$thr1->join;
$thr2->join;
$thr3->join;

一旦我引入共享变量$a，CPU使用率就会介于40％到50％之间：

use threads;
use threads::shared;

my $a : shared;
$a = 1000;

sub fib {
  my ( $n ) = @_;
  if ( $n < 2 ) {
    return $n;
  } else {
    return $a + fib( $n - 1 ) + fib( $n - 2 ); # <-- $a was added here
  }
}

my $thr1 = threads->create( 'fib', 35 );
my $thr2 = threads->create( 'fib', 35 );
my $thr3 = threads->create( 'fib', 35 );

$thr1->join;
$thr2->join;
$thr3->join;

因此$a是只读的，不会发生锁定，但性能会下降。我很好奇为什么会这样。

目前我在Windows XP上使用Cygwin下的Perl 5.10.1。不幸的是，我无法在非Windows机器上测试这个（希望）更新的Perl。

Answer 1

您的代码是围绕同步结构的紧密循环。通过让每个线程将共享变量（每个线程只需一次）复制到非共享变量中来优化它。

Answer 2

在Perl中构建包含大量数据的共享对象是可能的，而不用担心额外的副本。产生worker时对性能没有影响，因为共享数据驻留在单独的线程或进程中，具体取决于是否使用线程。

use MCE::Hobo;    # use threads okay or parallel module of your choice
use MCE::Shared;

# The module option constructs the object under the shared-manager.
# There's no trace of data inside the main process. The construction
# returns a shared reference containing an id and class name.

my $data = MCE::Shared->share( { module => 'My::Data' } );
my $b;

sub fib {
  my ( $n ) = @_;
  if ( $n < 2 ) {
    return $n;
  } else {
    return $b + fib( $n - 1 ) + fib( $n - 2 );
  }
}

my @thrs;

push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(1000), fib(35) } );
push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(2000), fib(35) } );
push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(3000), fib(35) } );

$_->join() for @thrs;

exit;

# Populate $self with data. When shared, the data resides under the
# shared-manager thread (via threads->create) or process (via fork).

package My::Data;

sub new {
  my $class = shift;
  my %self;

  %self = map { $_ => $_ } 1000 .. 5000;

  bless \%self, $class;
}

# Add any getter methods to suit the application. Supporting multiple
# keys helps reduce the number of trips via IPC. Serialization is
# handled automatically if getter method were to return a hash ref.
# MCE::Shared will use Serial::{Encode,Decode} if available - faster.

sub get_keys {
  my $self = shift;
  if ( wantarray ) {
    return map { $_ => $self->{$_} } @_;
  } else {
    return $self->{$_[0]};
  }
}

1;

具有共享变量的Perl线程性能

2 个答案: