Question

我有一个脚本，我在其中使用Perl数组。每个数组包含数十万个项目。

我经常需要在数组中间动态添加项目，或从中删除项目。

我想了解我是否更好地使用链接列表而不是Perl数组，因为我经常插入和删除

所以我的问题是：

splice()如何实施？
splice()的复杂性，用于在Perl数组中将项x插入索引i时
您能推荐一个与您合作过的Perl链表模块吗？

谢谢！

Answer 1

Perl数组存储为指针数组，起始偏移量，长度和分配的长度。

因此，从中间插入或删除将需要移动数组中后面元素数量的4或8个字节。从任一端删除不需要移动任何东西，只需调整起始偏移或长度。最后插入通常只需要调整长度，但有时需要重新分配整个指针数组。在开头插入时，perl将尽力安排，以便只需要调整起始偏移量，但有时整个阵列需要移动甚至重新分配。

实际上，使用perl操作创建和管理链表的开销在几乎所有情况下都要比使用数组要大得多。

要对它进行基准测试，我们需要更多地了解您的具体情况;数组的实际大小，元素的类型和大小（与拼接成本无关，但可能与链表相关），插入/删除的相对频率等。

Answer 2

快速拼接基准测试并且它似乎表现为O（N）的删除和插入。

脚本：

my $length = shift;
my $toSplice = 100;

my @list = (1 .. $length);

my $t0 = Time::HiRes::time();
for(1 .. $toSplice) {
    my $removeIdx = int(rand() * @list);
    splice @list, $removeIdx, 1;
}

my $t1 = Time::HiRes::time();
for(1 .. $toSplice) {
    my $insertIdx = int(rand() * @list);
    splice @list, $insertIdx, 0, 0;
}

printf("Took %.4fs to remove\n", $t1 - $t0);
printf("Took %.4fs to insert\n", Time::HiRes::time() - $t0);

结果：

$ perl test.pl 100000
Took 0.0026s to remove
Took 0.0092s to insert
$ perl test.pl 1000000
Took 0.0296s to remove
Took 0.0617s to insert
$ perl test.pl 10000000
Took 0.2876s to remove
Took 0.6252s to insert

因此，将迭代次数增加10倍会使运行时间增加大约10倍。

Answer 3

您对阵列与链接列表的基准测试存在缺陷。可以使用以下方法加速数组方法：

创建一个标量数组，而不是多余的哈希引用数组来匹配链表。

这使执行速度提高了4倍。
由于您只是对列表进行一次传递，因此请创建一个新列表，而不是尝试拼接旧列表。

这会使速度提高10倍。

当然这会使你的记忆增加一倍，但使用链接列表至少会增加5倍。

以下是显示这两项改进的基准。我还简化了链表功能，但即使对两者都进行了改进，数组方法的速度仍然是其两倍。

use strict;
use warnings;

use Benchmark;

my $INSERTION_FREQUENCY = 5;

my $num_of_items = shift or die "Specify size of list\n";

timethese(10, {
    'linked_list'  => sub { linked_list($num_of_items) },
#   'array_splice' => sub { array_splice($num_of_items) },
    'array_map'    => sub { array_map($num_of_items) },
});

sub linked_list {
    my $count = shift;

    my $curr_node = my $list_head = {data => 1};

    # Creating List 
    for my $i (2 .. $num_of_items) {
        $curr_node = $curr_node->{next} = {
            data => $i,
            prev => $curr_node,
        };
    }

    # Inserting Items
    $curr_node = $list_head;
    my $i = 0;
    while ($curr_node) {
        if (++$i % $INSERTION_FREQUENCY == 0) {
            my %new_node = (
                data => "inserted",
                prev => $curr_node->{"prev"},
                next => $curr_node,
            );
            $curr_node->{"prev"}{"next"} = \%new_node if $curr_node->{"prev"};
            $curr_node->{"prev"} = \%new_node;
        }
        $curr_node = $curr_node->{"next"};
    }

    return $list_head;
}

sub array_splice {
    my $num_of_items = shift;

    # Creating Array
    my @array = (1..$num_of_items);

    # Inserting Items
    for my $i (1 .. $num_of_items) {
        if ($i % $INSERTION_FREQUENCY == 0) {
            splice(@array, $i - 1, 0, "inserted");
        }
    }

    return \@array;
}

sub array_map {
    my $num_of_items = shift;

    # Creating Array
    my @array = (1..$num_of_items);

    # Inserting Items
    my $i = 0;
    @array = map {
        ++$i % $INSERTION_FREQUENCY == 0 ? ("inserted", $_) : $_
    } @array;

    return \@array;
}

基准

$ perl arrays.pl 100000
Benchmark: timing 10 iterations of array_map, array_splice, linked_list...
 array_map:  1 wallclock secs ( 0.58 usr +  0.01 sys =  0.59 CPU) @ 16.89/s (n=10)
array_splice: 16 wallclock secs (16.21 usr +  0.00 sys = 16.21 CPU) @  0.62/s (n=10)
linked_list:  2 wallclock secs ( 1.43 usr +  0.09 sys =  1.53 CPU) @  6.54/s (n=10)

$ perl arrays.pl 200000
Benchmark: timing 10 iterations of array_map, array_splice, linked_list...
 array_map:  1 wallclock secs ( 1.20 usr +  0.05 sys =  1.25 CPU) @  8.01/s (n=10)
array_splice: 64 wallclock secs (64.10 usr +  0.03 sys = 64.13 CPU) @  0.16/s (n=10)
linked_list:  3 wallclock secs ( 2.92 usr +  0.23 sys =  3.15 CPU) @  3.17/s (n=10)

$ perl arrays.pl 500000
Benchmark: timing 10 iterations of array_map, linked_list...
 array_map:  4 wallclock secs ( 3.12 usr +  0.36 sys =  3.48 CPU) @  2.87/s (n=10)
linked_list:  8 wallclock secs ( 7.52 usr +  0.70 sys =  8.22 CPU) @  1.22/s (n=10)

Answer 4

我也做了一个基准，想与你分享结果。

在我得到的结果中，链表是远远快于的Perl数组。

这是我做过的基准：

创建了包含1M项目的链接列表或数组
对列表/数组进行了迭代，并在适当的位置进行了200K插入
检查每个方案花了多少时间。

链接列表：2秒
Perl-array：1：55min

我与你分享代码：

运行命令和结果：

> time perl_benchmark.pl list 1000000
1.876u 0.124s 0:02.01 99.0%     0+0k 0+0io 0pf+0w
> time perl_benchmark.pl array 1000000
115.159u 0.104s 1:55.27 99.9%   0+0k 0+0io 0pf+0w

源代码：

my $INSERTION_FREQUENCY = 5;

my $use_list = $ARGV[0] eq "list";
my $num_of_items = $ARGV[1];

my $list_header;
my $list_tail;

my @array;

# Creating List or Array
for (my $i = 0 ; $i < $num_of_items ; $i++) {
    my %new_node;
    $new_node{"data"} = $i;
    if ($use_list) {        
        if (! defined($list_header)) {
            $list_header = $list_tail = \%new_node;
        } else {
            $new_node{"prev"} = $list_tail;
            $list_tail->{"next"} = \%new_node;          
            $list_tail = \%new_node;
        }
    } else {
        push(@array, \%new_node);
    }
}

# Inserting Items
my $curr_node = $list_header;
for (my $i = 1 ; $i < $num_of_items ; $i++) {
    if ($i % $INSERTION_FREQUENCY == 0) {
        my %new_node;
        $new_node{"data"} = "inserted";
        if ($use_list) {
            my $prev_ptr = $curr_node->{"prev"};
            if (defined($prev_ptr)) {
                $prev_ptr->{"next"} = \%new_node;
            }
            $new_node{"prev"} = $prev_ptr;
            $new_node{"next"} = $curr_node;
            $curr_node->{"prev"} = \%new_node

        } else {
            splice(@array, $i - 1, 0, \%new_node);
        }
    }
    if ($use_list) {
        $curr_node = $curr_node->{"next"};
    }
}

Perl：使用＆＃39; splice（）＆＃39;进行阵列插入的性能VS链表

4 个答案: