Question

Perl的新手！！需要一些帮助：）我有2个文件，每个文件的大小接近500kb。

我需要在这些文件中搜索一组字符串（大约800个字符串），以检查该字符串是否存在于文件1，文件2或同时存在于这两个文件中，或者两者都不存在。

我知道的唯一选择是打开file1，逐行读取并检查其中是否存在字符串，并对file2进行相同操作。对近800个字符串（搜索字符串）执行整个过程似乎并不好，也不有效。

是否还有其他更有效的选择或使用PERL为其编写单行代码？

Answer 1

以下是使用Regexp::Assemble的示例。假设要匹配的字符串不跨越多行，它将为所有可以为每一行检查的字符串创建一个通用的正则表达式。因此，它只需要读取一次文件。

use feature qw(say);
use strict;
use warnings;
use Regexp::Assemble;

my @strings = qw(abc efg);  # <- Add more strings here

my $ra = Regexp::Assemble->new;
$ra->add( $_ ) for @strings;  # <- Creates a regexp that matches all the strings..
my $re = $ra->re;
my @files = qw(file1.txt file2.txt);  # <- Add more files if needed..
my @matches;
for my $file (@files) {
    push @matches, get_matches( $file, $re );
}
# Now post process the matches as you like..

sub get_matches {
    my ( $fn, $re ) = @_;

    my %matches;
    open ( my $fh, '<', $fn ) or die "Could not open file '$fn': $!";
    while (my $line = <$fh>) {
        while ( $line =~ /($re)/g ) {
            $matches{$1}++;
        }
    }
    close $fh;
    return \%matches;
}

使用Perl在多个文件中搜索字符串

1 个答案: