Question

我有一个巨大的文本文件，前五行如下：

This is fist line
This is second line
This is third line
This is fourth line
This is fifth line

现在，我想在该文件的第三行的随机位置写一些东西，它将用我正在编写的新字符串替换该行中的字符。我可以使用以下代码实现这一点：

use strict;
use warnings;

my @pos = (0);
open my $fh, "+<", "text.txt";

while(<$fh) {
    push @pos, tell($fh);
}

seek $fh , $pos[2]+1, 0;
print $fh "HELLO";

close($fh);

但是，我无法用同样的方法弄清楚如何从该文件中删除整个第三行，以便文本如下所示：

This is fist line
This is second line
This is fourth line
This is fifth line

我不想将整个文件读入数组，也不想使用Tie :: File。是否有可能使用搜索和告诉来实现我的要求？解决方案将非常有用。

Answer 1

文件是一个字节序列。我们可以替换（覆盖）其中一些，但我们如何删除呢？一旦文件被写入，其字节就不会被拔出＆＃39;的序列或'＃blank;＆＃39;以任何方式。（根据需要截断文件，可以解除文件末尾的那些。）

其他内容必须向上移动，以便删除的文本后面的内容会覆盖它。我们必须重写文件的其余部分。在实践中，重写整个文件通常要简单得多。

作为一个非常基本的例子

use warnings 'all';
use strict;
use File::Copy qw(move);

my $file_in = '...';
my $file_out = '...';  # best use `File::Temp`

open my $fh_in,  '<', $file_in  or die "Can't open $file_in: $!";
open my $fh_out, '>', $file_out or die "Can't open $file_out: $!";

# Remove a line with $pattern
my $pattern = qr/this line goes/;

while (<$fh_in>) 
{
    print $fh_out $_  unless /$pattern/;
}
close $fh_in;
close $fh_out;

# Rename the new fie into the original one, thus replacing it
move ($file_out, $file_in) or die "Can't move $file_out to $file_in: $!";

这会将输入文件的每一行写入输出文件，除非一行与给定的模式匹配。然后重命名该文件，替换原始文件（不涉及数据副本）。请参阅this topic in perlfaq5。

由于我们确实使用了临时文件，因此我建议使用核心模块File::Temp。

通过在更新'+<'模式下打开以便仅覆盖文件的一部分，可以提高效率，但要复杂得多。迭代直到带有模式的行，记录（tell）其位置和行长度，然后复制内存中的所有剩余行。然后seek返回到该行的减去长度的位置，并转储复制的文件的其余部分，覆盖该行及其后面的所有内容。

请注意，现在文件其余部分的数据将被复制两次，尽管一个副本在内存中。如果要删除的行远远超过非常大的文件，那么遇到这种麻烦可能有意义。如果要删除更多行，则会变得更加混乱。

写出新文件并将其复制到原始文件会更改文件的 inode 编号。这可能是某些工具或程序的问题，如果是，您可以通过

更新原始文件

写完新文件后，打开它进行阅读并打开原稿进行书写。这破坏了原始文件。然后从新文件中读取并写入原始文件，从而将内容复制回相同的inode。完成后删除新文件。
以读写模式（'+<'）打开原始文件以开始。写入新文件后，将seek写入原始文件的开头（或覆盖的地方），并将新文件的内容写入其中。如果新文件较短，请记住还要设置文件结尾
```
truncate $fh, tell($fh); 
```

复制完成后

。这需要一些小心，第一种方式通常可能更安全。

如果文件不是很大的新文件＆＃34;可以＆＃34;写＆＃34;在内存中，作为数组或字符串。

Answer 2

在Perl中使用Linux命令行中的sed命令：

my $return = `sed -i '3d' text.txt`;

其中“3d”表示删除第3行。

Answer 3

查看perlrun并查看perl本身如何就地修改文件非常有用。

假设：

$ cat text.txt
This is fist line
This is second line
This is third line
This is fourth line
This is fifth line

通过使用-i和-p开关来调用Perl，您可以显然“就地修改”，例如：

$ perl -i -pe 's/This is third line\s*//' text.txt
$ cat text.txt
This is fist line
This is second line
This is fourth line
This is fifth line

但如果您查阅Perl Cookbook配方7.9（或查看perlrun），您会看到：

$ perl -i -pe 's/This is third line\s*//' text.txt

相当于：

while (<>) {
    if ($ARGV ne $oldargv) {           # are we at the next file?
        rename($ARGV, $ARGV . '.bak');
        open(ARGVOUT, ">$ARGV");       # plus error check
        select(ARGVOUT);
        $oldargv = $ARGV;
    }
    s/This is third line\s*//;
}
continue{
    print;
}
select (STDOUT);                      # restore default output

从Perl中的大文件中删除一行

3 个答案: