Question

我有一些perl脚本来逐行处理文件（包含大量数字）。

文件内容（样本数据，前3个数字按空格分隔，然后单独的第3个和第4个数字之间的标签）：

1 2 3 15
2 9 8 30
100 106 321 92
9 8 2 59
300 302 69 88
....

脚本内容：

# snippet of script
open(INF, "$infile") || die "Unable to open file $infile: $!\n";
@content = <INF>;
close(INF);

foreach $line (@content) {
    # blah blah, script to handle math here
    # Now the numbers are stored in separate variables
    # $n1 stores the 1st number, i.e.: 1
    # $n2 stores the 2nd number, i.e.: 2
    # $n3 stores the 3rd number, i.e.: 3
    # $n4 stores the 4th number, i.e.: 15
    # Solution code to be inserted here
}

我想：

对变量$ n1，$ n2，$ n3进行排序，并按升序输出。
在foreach结束时，摆脱重复

我的方法：

# Insert below code to foreach
$numbers{$n1} = 1;
$numbers{$n2} = 1;
$numbers{$n3} = 1;
@keys = sort { $numbers{$b} <=> $numbers{$a} } keys %numbers;
#push @numbers, "$keys[0] $keys[1] $keys[2]";
$numbers2{"$keys[0] $keys[1] $keys[2]"} = 1;

这定义了两个哈希：第一个哈希用于排序，第二个哈希用于在排序后删除重复。

有没有更好的方法？谢谢，

Answer 1

使用其他解决方案更新了 - 可能是重复的行，而不是行上的数字。

为了删除重复的行，如果我们在数组中包含三个数字的所有排序行，则最简单。然后我们通过uniq运行它们来对其进行后期处理。有（至少）两种方法可以做到这一点。

将数据存储在数组中，每个数组都是对带有三个数字的已排序数组的引用。然后为了比较，在运行中构造每个字符串。如果在某个地方还有其他数字处理，那就更好了，因为它们在数组中。
从每个排序的行中构建一个字符串，并将它们存储在一个数组中。然后它更容易比较。

下面我使用第一种方法，假设还有其他数字处理。

use warnings;
use strict;
use feature wq(say);
use List::MoreUtils qw(uniq);

my $file = 'sort_nums.txt';
my @content = do {
    open my $fh, '<', $file  or die "Can't open $file: $!";
    <$fh>;
};

my @linerefs_all;
foreach my $line (@content) {
    # Calculations ... numbers stored in ($n1, $n2, $n3, $n4)
    my ($n1, $n2, $n3) = split '\s+' $line;   # FOR TESTING 
    # Add to @rlines a reference to the sorted array with first three
    push @linerefs, [ sort { $a <=> $b } ($n1, $n2, $n3) ];
}
# Remove dupes by comparing line-arrays as strings, then remake arrayrefs
my @linerefs = map { [ split ] } uniq map { join ' ', @$_ } @linerefs_all;
say "@$_" for @linerefs;

使用文件sort_nums.txt中的已发布行，上面的代码打印

1 2 3
2 8 9
100 106 321
69 300 302

后处理行的说明，从右边读取。

右侧的map处理一个arrayrefs列表。它将每个元素和join的元素用空格取消引用，形成一条线。它返回一个这样的字符串列表，每行一个。
该列表由uniq修剪重复项，split本身返回一个列表，并输入左侧的map。
在map的块中，每个字符串都是this in perlfaq4，将（默认）空格分隔成一个列表（行中的数字），然后引用这是由[ ]采取的。因此，map返回一个对数组的引用列表，每行一个，分配给@linerefs的内容。

然后打印出来。如果这对于一个语句来说太过分了，那么将该过程分解为步骤，生成中间数组。或者切换到上面的第二种方法。

首发帖子，假设每行上的数字可能重复

我的目标是：为每一行排序三个变量并保留唯一的变量。

use List::MoreUtils qw(uniq);

foreach my $line (@content) {
    # Calculations, numbers stored in ($n1, $n2, $n3, $n4)
    my @nums = uniq sort { $a <=> $b } ($n1, $n2, $n3);
    say "@nums";
}

请记住，在此之后您不知道$n1，$n2，$n3中的哪一个可能已被删除。

如果由于某种原因，非核心模块不合适，请参阅https://stat.ethz.ch/R-manual/R-devel/library/stats/html/coef.html例如，

my %seen = ();
my @nums = sort { $a <=> $b } grep { ! $seen{$_}++ } ($n1, $n2, $n3);

或者，如果你需要它而没有额外的哈希

my @nums = do { my %seen; sort { $a <=> $b } grep { !$seen{$_}++ } ($n1, $n2, $n3) };

Perl对数字排序

1 个答案: