Question

给定一个很长的行的大文件（可能是1.5GB），我需要运行替换。我想过运行一个滑动窗口，其中read 4096字节离开文件，将它们附加到上一个块并替换它，但当然，对于替换可能不适用的替换，它不能很好地工作对面。

我想到另一个想法，我可以从我的前两个块开始，然后s///开始。如果没有替换，我将chunk1写入磁盘并抓取chunk3。然后我在chunk2和chunk3上s///。如果有子站我抓住chunk4并追加它。然后我继续追加，直到没有变电站。那时我把最新的块写到磁盘上。像这样：

read( $data, $previous_chunk, 4096 );
while( read( $data, $this_chunk, 4096 ) > 0 ){
    my $chunk = $previous_chunk . $this_chunk;
    if( 0 == $chunk =~ s/foo/bar/g ){
        # There was nothing to substitute, so we'll append all 
        # the old stuff to our file and just keep the latest chunk.
        print OUTPUT $previous_chunk;
        $previous_chunk = $this_chunk;
    }
    else {
        # There was a substitution, so we want to keep building 
        # in case it crossed the fold
        $previous_chunk .= $this_chunk;
    }
}

听起来有道理吗？我能看到的唯一问题是，如果替换导致正在运行$previous_chunk中的新匹配。因此，我们可能需要以某种方式清除$previous_chunk到最新的替换，并且只保留其后的干净内容。（例如，如果我们有s/foo/foobar/，我们就会将“foo”变成“foobar”，然后进入“foobarbar”，然后进入＆＃39; foobarbar＆＃39; foobarbar＆＃39; foobarbar＆＃39; foobarbar＆＃39; foobarbar＆＃39; foobarbar foobarbarbar＆＃39;。有没有办法避免这种情况？

有更好的方法吗？

Answer 1

你问是否有更好的方法来做到这一点。＆＃34;更好的＆＃34;是主观的，但如果我遇到同样的问题，这就是我要做的事情。

一个选项是不要使用Perl，而是在支持巨大文件的文本编辑器中执行搜索和替换。

在Windows上，一个编辑器就是EditPad Pro，它支持大文件。不是我的产品，所以这不是广告。我使用EPP是因为它是以正则表达式为中心的（与RegexBuddy相同）。

引自the brochure：Open and edit files of absolutely any size, including files larger than 4 GB, even on a 32-bit system with a modest amount of RAM.

这允许您卸载整个大文件问题。

可能相关或不相关的其他功能：您可以保存正则表达式，使用宏链接它们，并调用工具（例如您自己的Perl脚本）来操作当前打开的文件（尽管在这种情况下您只需要一次再次受到外部工具内存管理的支配。

正则表达式替换大文件

1 个答案: