Question

有一个名为“input.txt”的文本文件

some field1a | field1b | field1c
...another approx 1000 lines....
fielaNa | field Nb | field Nc

我可以选择任何字段分隔符。

需要一个脚本，每次离散运行的内容都会从此文件中获得一个唯一（从不重复）的随机行，直到使用所有行。

我的解决方案：我在文件中添加了一列，所以

0|some field1a | field1b | field1c
...another approx 1000 lines....
0|fielaNa | field Nb | field Nc

并使用下一个代码处理它：

use 5.014;
use warnings;
use utf8;
use List::Util;
use open qw(:std :utf8);
my $file = "./input.txt";

#read all lines into array and shuffle them
open(my $fh, "<:utf8", $file);
my @lines = List::Util::shuffle map { chomp $_; $_ } <$fh>;
close $fh;

#search for the 1st line what has 0 at the start
#change the 0 to 1
#and rewrite the whole file

my $random_line;
for(my $i=0; $i<=$#lines; $i++) {
    if( $lines[$i] =~ /^0/ ) {
        $random_line = $lines[$i];
        $lines[$i] =~ s/^0/1/;
        open($fh, ">:utf8", $file);
        print $fh join("\n", @lines);
        close $fh;
        last;
    }
}
$random_line = "1|NO|more|lines" unless( $random_line =~ /\w/ );

do_something_with_the_fields(split /\|/, $random_line))
exit;

这是一个有效的解决方案，但不是很好，因为：

每个脚本运行时行顺序都在变化
不是并发脚本运行安全。

如何更有效，更优雅地写出来？

Answer 1

如何在一个不同的文件中保留一个洗牌的行号列表，每次使用它时删除第一个行号？可能需要一些锁定才能确保脚本运行安全。

Answer 2

来自perlfaq5。

How do I select a random line from a file?

没有将文件加载到数据库或预先索引行   文件，你可以做几件事。

以下是Camel Book中的油藏采样算法：
srand;
rand($.) < 1 && ($line = $_) while <>;
这在阅读整个文件的空间方面具有显着优势   你可以在“计算机的艺术”中找到这种方法的证明   编程，第2卷，第3.4.2节，Donald E. Knuth。

您可以使用File :: Random模块为其提供功能   算法：
use File::Random qw/random_line/;
my $line = random_line($filename);
另一种方法是使用Tie :: File模块，它处理整个文件作为数组。只需访问随机数组元素。

所有Perl程序员都应该花时间阅读常见问题解答。

更新：每次您必须存储状态时获取唯一的随机行。存储状态的最简单方法是从文件中删除您使用过的行。

Answer 3

此程序使用Tie::File模块打开您的input.txt文件以及indices.txt文件。

如果indices.txt为空，则使用随机排序的顺序初始化input.txt中所有记录的索引。

每次运行时，列表末尾的索引都会被删除，并显示相应的输入记录。

use strict;
use warnings;

use Tie::File;
use List::Util 'shuffle';

tie my @input, 'Tie::File', 'input.txt'
        or die qq(Unable to open "input.txt": $!);

tie my @indices, 'Tie::File', 'indices.txt'
        or die qq(Unable to open "indices.txt": $!);

@indices = shuffle(0..$#input) unless @indices;

my $index = pop @indices;
print $input[$index];

<强>更新

我已经修改了这个解决方案，只有当它不存在时才会填充新的indices.txt文件，而不像以前一样只填充它。这意味着只需删除indices.txt文件即可打印新的记录序列。

use strict;
use warnings;

use Tie::File;
use List::Util 'shuffle';

my ($input_file, $indices_file) = qw( input.txt indices.txt );

tie my @input, 'Tie::File', $input_file
        or die qq(Unable to open "$input_file": $!);

my $first_run = not -f $indices_file;

tie my @indices, 'Tie::File', $indices_file
        or die qq(Unable to open "$indices_file": $!);

@indices = shuffle(0..$#input) if $first_run;

@indices or die "All records have been displayed";
my $index = pop @indices;
print $input[$index];

使用perl从文本文件中获取唯一的随机行（在每个脚本运行时）

3 个答案: