Question

不幸的是，我不是正则表达式专家，所以我需要一些帮助。

我正在寻找解决方法如何grep一个字符串数组来获得两个字符串列表，这些字符串不会以特定的子字符串开始（1）或结束（2）。

假设我们有一个数组，其字符串与以下规则匹配：

[speakerId] - [短语] - [ID]的.txt

即

10-phraseone-10.txt 11-phraseone-3.txt 1-phraseone-2.txt 2-phraseone-1.txt 3-phraseone-1.txt 4-phraseone-1.txt 5-phraseone-3.txt 6-phraseone-2.txt 7-phraseone-2.txt 8-phraseone-10.txt 9-phraseone-2.txt 10-phrasetwo-1.txt 11-phrasetwo-1.txt 1-phrasetwo-1.txt 2-phrasetwo-1.txt 3-phrasetwo-1.txt 4-phrasetwo-1.txt 5-phrasetwo-1.txt 6-phrasetwo-3.txt 7-phrasetwo-10.txt 8-phrasetwo-1.txt 9-phrasetwo-1.txt 10-phrasethree-10.txt 11-phrasethree-3.txt 1-phrasethree-1.txt 2-phrasethree-11.txt 3-phrasethree-1.txt 4-phrasethree-3.txt 5-phrasethree-1.txt 6-phrasethree-3.txt 7-phrasethree-1.txt 8-phrasethree-1.txt 9-phrasethree-1.txt

让我们介绍变量：

$speakerId
$phrase
$id1，$id2

我想grep一个列表并获得一个数组：

包含特定$phrase的元素，但我们排除那些同时以特定$speakerId开头并且以指定ID之一结尾的标志（例如$id1或{{1} }}）
包含具有特定$id2和$speakerId的元素，但最后不包含特定ID之一（警告：请记住不排除$phrase的10或11等等。）

也许某人可以使用以下代码编写解决方案：

$id=1

Answer 1

假设一个基本模式与您的示例匹配：

(?:^|\b)(\d+)-(\w+)-(?!1|2)(\d+)\.txt(?:\b|$)

分解为：

(?:^|\b)    # starts with a new line or a word delimeter
(\d+)-      # speakerid and a hyphen
(\w+)-      # phrase and a hyphen
(\d+)       # id
\.txt       # file extension
(?:\b|$)    # end of line or word delimeter

您可以使用否定预测断言排除。例如，要包含所有不包含短语phrasetwo的匹配项，您可以修改上述表达式以使用否定前瞻：

(?:^|\b)(\d+)-(?!phrasetwo)(\w+)-(\d+)\.txt(?:\b|$)

请注意我是如何加入(?!phrasetwo)的。或者，您会发现所有phrasethree个条目以偶数结尾，而不是使用后视：

(?:^|\b)(\d+)-phrasethree-(\d+)(?<![13579])\.txt(?:\b|$)

(?<![13579])只需确保ID的最后一个数字落在偶数上。

Answer 2

我喜欢使用负前瞻和后备的纯正则表达式的方法。但是，它有点难以阅读。也许这样的代码可能更加不言自明。它使用的标准perl习语在某些情况下可以像英语一样读取：

my @all_entries      = readdir(...);
my @matching_entries = ();

foreach my $entry (@all_entries) {

    # split file name
    next unless /^(\d+)-(.*?)-(\d+).txt$/;
    my ($sid, $phrase, $id) = ($1, $2, $3);

    # filter
    next unless $sid eq "foo";
    next unless $id == 42 or $phrase eq "bar";
    # more readable filter rules

    # match
    push @matching_entries, $entry;
}

# do something with @matching_entries

如果您真的想在grep列表转换中表达复杂的内容，可以编写如下代码：

my @matching_entries = grep {

    /^(\d)-(.*?)-(\d+).txt$/
    and $1 eq "foo"
    and ($3 == 42 or $phrase eq "bar")
    # and so on

} readdir(...)

Answer 3

听起来有点像你在描述查询功能。

#!/usr/bin/perl -Tw

use strict;
use warnings;
use Data::Dumper;

my ( $set_a, $set_b ) = query( 2, 'phrasethree', [ 1, 3 ] );

print Dumper( { a => $set_a, b => $set_b } );

# a) fetch elements which
#    1. match $phrase
#    2. exclude $speakerId
#    3. match @ids
# b) fetch elements which
#    1. match $phrase
#    2. match $speakerId
#    3. exclude @ids
sub query {
    my ( $speakerId, $passPhrase, $id_ra ) = @_;

    my %has_id = map { ( $_ => 0 ) } @{$id_ra};

    my ( @a, @b );

    while ( my $filename = glob '*.txt' ) {

        if ( $filename =~ m{\A ( \d+ )-( .+? )-( \d+ ) [.] txt \z}xms ) {

            my ( $_speakerId, $_passPhrase, $_id ) = ( $1, $2, $3 );

            if ( $_passPhrase eq $passPhrase ) {

                if ( $_speakerId ne $speakerId
                    && exists $has_id{$_id} )
                {
                    push @a, $filename;
                }

                if ( $_speakerId eq $speakerId
                    && !exists $has_id{$_id} )
                {
                    push @b, $filename;
                }
            }
        }
    }

    return ( \@a, \@b );
}

如何删除不以特定子字符串开头或结尾的字符串？

3 个答案: