经测试的程序

Question

过去两周我一直在学习perl。我一直在为我的学校项目写一些perl脚本。我需要为多个字符串解析一个文本文件。我搜索了perl论坛并获得了一些信息。下面的函数解析一个字符串的文本文件并返回结果。但是我需要脚本来搜索文件中的多个字符串。

use strict;
use warnings;


sub find_string {
    my ($file, $string) = @_;
    open my $fh, '<', $file;
    while (<$fh>) {
        return 1 if /\Q$string/;
    }
    die "Unable to find string: $string";
}

find_string('filename', 'string');

现在，例如，如果文件包含多个带有正则表达式的字符串，如下所示

"testing"
http://www.yahoo.com =1
http://www.google.com=2

我希望该函数能够搜索多个字符串，例如

find_string('filename', 'string1','string2','string3');

请有人可以解释我如何做到这一点。这会非常有帮助

Answer 1

在这里快速完成这个：

您现在传递一个文件的名称和一个字符串。如果你传递多个字符串怎么办：

 if ( find_string ( $file, @strings ) ) {
    print "Found a string!\n";
}
else {
    print "No string found\n";
}


..

sub find_string {
    my $file    = shift;
    my @strings = @_;
    #
    # Let's make the strings into a regular expression
    #
    my $reg_exp = join "|" ,@strings;   # Regex is $string1|$string2|$string3...

    open my $fh, "<", $file or die qq(Can't open file...);
    while ( my $line = <$fh> ) {
       chomp $line;
       if ( $line =~ $reg_exp ) {
           return 1;     # Found the string
       }
    }
    return 0;            # String not found
}

我即将参加一个会议，所以我甚至没有对此进行过测试，但这个想法就在那里。一些事情：

您希望处理字符串中可能是正则表达式字符的字符。您可以使用quotemeta命令，也可以在每个字符串之前和之后使用\Q和\E。
考虑使用use autodie来处理无法打开的文件。然后，您不必检查您的公开声明（就像我上面所做的那样）。
有局限性。如果你要搜索1000个不同的字符串，这将是非常糟糕的，但是应该可以使用一些字符串。
请注意我如何使用标量文件句柄（$fh）。我不会通过子程序打开你的文件，而是传入一个标量文件句柄。这将允许您处理主程序中的无效文件问题。这是标量文件句柄的一大优势：它们可以很容易地传递给子程序并存储在类对象中。

经测试的程序

#! /usr/bin/env perl
#

use strict;
use warnings;
use autodie;
use feature qw(say);

use constant {
    INPUT_FILE =>       'test.txt',
};


open my $fh, "<", INPUT_FILE;

my @strings = qw(foo fo+*o bar fubar);

if ( find_string ( $fh, @strings ) ) {
    print "Found a string!\n";
}
else {
    print "No string found\n";
}

sub find_string {
    my $fh    = shift;          # The file handle
    my @strings = @_;           # A list of strings to look for

    #
    # We need to go through each string to make sure there's
    # no special re characters
    for my $string ( @strings ) {
        $string = quotemeta $string;
    }

    #
    # Let's join the stings into one big regular expression
    #
    my $reg_exp = join '|', @strings;   # Regex is $string1|$string2|$string3...
    $reg_exp = qr($reg_exp);            # This is now a regular expression

    while ( my $line = <$fh> ) {
        chomp $line;
        if ( $line =~ $reg_exp ) {
            return 1;     # Found the string
        }
    }
    return 0;            # String not found
}

autodie在我无法打开文件时处理问题。无需检查。
注意我的open中有三个参数。这是首选方式。
我的文件句柄是$fh，它允许我将其传递给我的find_string子例程。在主程序中打开文件，我可以在那里处理读错误。
我遍历我的@strings并使用quotemeta命令自动转义特殊的正则表达式字符。
请注意，当我在循环中更改$string时，它实际上会修改@strings数组。
我使用qr创建正则表达式。
我的正则表达式为/foo|fo\+\*o|bar|fubar/。
有一些错误例如，字符串fooburberry将与foo匹配。你想要那个，或者你想要字符串是整个单词吗？

Answer 2

我很高兴在您的脚本中看到use strict和use warnings。这是一种基本的方法。

use strict;
use warnings;


sub find_string {

    my ($file, $string1, $string2, $string3) = @_;

    my $found1 = 0;
    my $found2 = 0;
    my $found3 = 0;

    open my $fh, '<', $file;
    while (<$fh>) {
        if ( /$string1/ ) {
            $found1 = 1;
        }
        if ( /$string2/ ) {
            $found2 = 1;
        }
        if ( /$string3/ ) {
            $found3 = 1;
        }
    }

    if ( $found1 == 1 and $found2 == 1 and $found3 == 1 ) {
        return 1;
    } else {
        return 0;
    }
}

my $result = find_string('filename', 'string1'. 'string2', 'string3');

if ( $result == 1 ) {
    print "Found all three strings\n";
} else {
    print "Didn't find all three\n";
}

Answer 3

我认为您可以先将文件内容存储在数组中，然后grep数组中的输入。

use strict;
use warnings;

sub find_multi_string {
    my ($file, @strings) = @_; 
    my $fh;
    open ($fh, "<$file");
    #store the whole file in an array
    my @array = <$fh>;

    for my $string (@strings) {
        if (grep /$string/, @array) {
            next;
        } else {
            die "Cannot find $string in $file";
        }   
    }   

    return 1;
}

perl解析多个字符串的文件

3 个答案:

经测试的程序