这是我正在尝试优化的子程序。它大部分都使用数组引用。目前这个子程序需要大约。平均运行30-40秒。如果可能的话,我想把它减少到10秒。你看到任何不必要的东西弹出给你了吗?
sub compute{
# takes two params: 2 array_refs
my ($gene_exp_ref, $centroids_ref) = @_;
my ($numerator, $denominator) = 0;
my ($prod_ref, $diff_x_ref, $diff_y_ref, $x_sq_ref, $y_sq_ref) = []; # diff_y is the center_gene
my %gene_center_pcc; # diff_x is gene of interest
my $gene_exp_average = mean($gene_exp_ref);
for my $gene_exp (@{$gene_exp_ref}) {
push(@{ $diff_x_ref }, ($gene_exp - $gene_exp_average));
}
# possible bottleneck
for my $centroid_gene_exp_ref (values %{$centroids_ref}){
$diff_y_ref = []; # initilize back to empty array
for my $index (@{$centroid_gene_exp_ref}) {
push(@{ $diff_y_ref }, ($index - mean($centroid_gene_exp_ref)));
}
@{ $prod_ref } = map { @{ $diff_x_ref }[$_] * @{ $diff_y_ref }[$_] } 0..$#{ $diff_x_ref };
$numerator = sum($prod_ref);
@{ $x_sq_ref } = map {$_*$_}@$diff_x_ref;
@{ $y_sq_ref } = map {$_*$_}@$diff_y_ref;
$denominator = sqrt(sum($x_sq_ref)) * sqrt(sum($y_sq_ref));
my $r = $numerator/$denominator;
my ($center) = grep { @{$gene_centers{$_}} ~~ @$centroid_gene_exp_ref } keys %gene_centers;
$gene_center_pcc{$center} = $r;
}
#return the center with the highest PCC
return (sort {$gene_center_pcc{$b} <=> $gene_center_pcc{$a}}
keys %gene_center_pcc)[0];
}
每个计算和数字运算步骤都是必要的。它会编译,但除非你有数据文件,否则你将无法正确使用子程序。
答案 0 :(得分:3)
for my $index (@{$centroid_gene_exp_ref}) {
push(@{ $diff_y_ref }, ($index - mean($centroid_gene_exp_ref)));
}
这将重新计算@{$centroid_gene_exp_ref}
中每个项目的平均值。如果该数组很大,它将以指数方式加起来(我假设mean()
没有缓存或记忆结果,强制它在每次调用时循环遍历数组)。您可以通过自己缓存平均值来节省相当多的时间:
my $mean = mean($centroid_gene_exp_ref);
for my $index (@{$centroid_gene_exp_ref}) {
push(@{ $diff_y_ref }, ($index - $mean));
}
除此之外,请查看Devel::NYTProf以查找您的实际瓶颈并在这些点上进行目标优化。
答案 1 :(得分:2)
您需要查看更大的图片,考虑到您之前的帖子,其中显示您为compute()
中的每个密钥拨打%$centroids_ref
:
foreach my $key ( keys %HoA ) {
compute($HoA{$key}, \%HoA); # on the first iteration, this actually passes an aref to [1,3,3,3]
}
即使在Dave Sherohman的优化之后,你仍然会一遍又一遍地进行大量的计算(如mean
)。
我的建议是你将外环带入compute()
。然后,对于HoA中的每个键,您可以存储计算并为每个键重用这些值。
sub compute{
my ($centroids_ref) = @_;
# precalculate these values once
my %means;
my %diffs;
my %sqrts;
foreach my $key (keys %$centroids_ref) {
my $mean = mean($centroids_ref->{$key});
my @diffs = map {$_ - $mean} @{$centroids_ref->{$key}};
my @squares = map {$_ * $_} @diffs;
my $sqrt = sqrt(sum(\@squares));
$means{$key} = $mean;
$diffs{$key} = \@diffs;
$sqrts{$key} = $sqrt;
}
# now do the main calculations from the 'possible bottlenecks' section
...
}