Question

我现在是Perl的新手，我偶然发现了一个问题：

我的任务是创建一种以最快的方式访问Perl中大文件行的简单方法。我创建了一个包含500万行的文件，每行包含行号。然后，我创建了我的主程序，需要能够打印给定行的任何内容。为此，我使用了我在互联网上找到的两种方法：

use Config qw( %Config );

my $off_t = $Config{lseeksize} > $Config{ivsize} ? 'F' : 'j';
my $file = "testfile.err";
open(FILE, "< $file")         or die "Can't open $file for reading: $!\n";
open(INDEX, "+>$file.idx")
        or die "Can't open $file.idx for read/write: $!\n";
build_index(*FILE, *INDEX);
my $line = line_with_index(*FILE, *INDEX, 129);
print "$line";

sub build_index {
    my $data_file  = shift;
    my $index_file = shift;
    my $offset     = 0;

    while (<$data_file>) {
        print $index_file pack($off_t, $offset);
        $offset = tell($data_file);
    }
}

sub line_with_index {
    my $data_file   = shift;
    my $index_file  = shift;
    my $line_number = shift;

    my $size;               # size of an index entry
    my $i_offset;           # offset into the index of the entry
    my $entry;              # index entry
    my $d_offset;           # offset into the data file

    $size = length(pack($off_t, 0));
    $i_offset = $size * ($line_number-1);
    seek($index_file, $i_offset, 0) or return;
    read($index_file, $entry, $size);
    $d_offset = unpack($off_t, $entry);
    seek($data_file, $d_offset, 0);
    return scalar(<$data_file>);
}

这些方法有时会起作用，我会在十次尝试不同的值集合中获得一次值，但大部分时间我得到＆＃34;在test2.pl第10行和第34行中使用未初始化的值$ line in string ; （当在文件中查找第566行时）或者没有正确的数值。此外，索引似乎在前两百行左右工作正常，但之后我得到了错误。我真的不知道自己做错了什么..

我知道你可以使用一个解析每一行的基本循环，但我真的需要一种方法，在任何给定的时间访问文件的一行而不重新解析它。

编辑：我尝试使用此处找到的小提示：Reading a particular line by line number in a very large file 我已经取代了＆＃34; N＆＃34;打包模板：

my $off_t = $Config{lseeksize} > $Config{ivsize} ? 'F' : 'j';

它使这个过程更好，直到第128行，而不是得到128，我得到一个空字符串。对于129，我得到3，这并不意味着很多......

Edit2：基本上我需要的是一种机制，使我能够读取接下来的2行，例如已经读取的文件，同时保持读取＆＃34; head＆＃34;在当前行（而不是2行）。

感谢您的帮助！

Answer 1

由于您正在将二进制数据写入索引文件，因此需要将文件句柄设置为二进制模式，尤其是在Windows中时：

open(INDEX, "+>$file.idx")
    or die "Can't open $file.idx for read/write: $!\n";
binmode(INDEX);

现在，当你在Windows中执行类似的操作时：

print $index_file pack("j", $offset);

Perl会将打包字符串中的任何0x0a＆＃s转换为0x0d0a＆＃39; s。将文件句柄设置为binmode将确保换行符不会转换为回车换行符。

在Perl中构建文件的索引

1 个答案: