如何删除不以特定子字符串开头或结尾的字符串?

时间:2012-11-19 14:26:04

标签: regex perl

不幸的是,我不是正则表达式专家,所以我需要一些帮助。

我正在寻找解决方法如何grep一个字符串数组来获得两个字符串列表,这些字符串不会以特定的子字符串开始(1)或结束(2)。

假设我们有一个数组,其字符串与以下规则匹配:

  

[speakerId] - [短语] - [ID]的.txt

  

10-phraseone-10.txt 11-phraseone-3.txt 1-phraseone-2.txt   2-phraseone-1.txt 3-phraseone-1.txt 4-phraseone-1.txt   5-phraseone-3.txt 6-phraseone-2.txt 7-phraseone-2.txt   8-phraseone-10.txt 9-phraseone-2.txt 10-phrasetwo-1.txt   11-phrasetwo-1.txt 1-phrasetwo-1.txt 2-phrasetwo-1.txt   3-phrasetwo-1.txt 4-phrasetwo-1.txt 5-phrasetwo-1.txt   6-phrasetwo-3.txt 7-phrasetwo-10.txt 8-phrasetwo-1.txt   9-phrasetwo-1.txt 10-phrasethree-10.txt 11-phrasethree-3.txt   1-phrasethree-1.txt 2-phrasethree-11.txt 3-phrasethree-1.txt   4-phrasethree-3.txt 5-phrasethree-1.txt 6-phrasethree-3.txt   7-phrasethree-1.txt 8-phrasethree-1.txt 9-phrasethree-1.txt

让我们介绍变量:

  • $speakerId
  • $phrase
  • $id1$id2

我想grep一个列表并获得一个数组:

  1. 包含特定$phrase的元素,但我们排除那些同时以特定$speakerId开头并且以指定ID之一结尾的标志(例如$id1或{{1} }})

  2. 包含具有特定$id2$speakerId的元素,但最后不包含特定ID之一(警告:请记住不排除$phrase的10或11等等。)

  3. 也许某人可以使用以下代码编写解决方案:

    $id=1

3 个答案:

答案 0 :(得分:3)

假设一个基本模式与您的示例匹配:

(?:^|\b)(\d+)-(\w+)-(?!1|2)(\d+)\.txt(?:\b|$)

分解为:

(?:^|\b)    # starts with a new line or a word delimeter
(\d+)-      # speakerid and a hyphen
(\w+)-      # phrase and a hyphen
(\d+)       # id
\.txt       # file extension
(?:\b|$)    # end of line or word delimeter

您可以使用否定预测断言排除。例如,要包含所有不包含短语phrasetwo的匹配项,您可以修改上述表达式以使用否定前瞻:

(?:^|\b)(\d+)-(?!phrasetwo)(\w+)-(\d+)\.txt(?:\b|$)

请注意我是如何加入(?!phrasetwo)的。或者,您会发现所有phrasethree个条目以偶数结尾,而不是使用后视:

(?:^|\b)(\d+)-phrasethree-(\d+)(?<![13579])\.txt(?:\b|$)

(?<![13579])只需确保ID的最后一个数字落在偶数上。

答案 1 :(得分:1)

我喜欢使用负前瞻和后备的纯正则表达式的方法。但是,它有点难以阅读。也许这样的代码可能更加不言自明。它使用的标准perl习语在某些情况下可以像英语一样读取:

my @all_entries      = readdir(...);
my @matching_entries = ();

foreach my $entry (@all_entries) {

    # split file name
    next unless /^(\d+)-(.*?)-(\d+).txt$/;
    my ($sid, $phrase, $id) = ($1, $2, $3);

    # filter
    next unless $sid eq "foo";
    next unless $id == 42 or $phrase eq "bar";
    # more readable filter rules

    # match
    push @matching_entries, $entry;
}

# do something with @matching_entries

如果您真的想在grep列表转换中表达复杂的内容,可以编写如下代码:

my @matching_entries = grep {

    /^(\d)-(.*?)-(\d+).txt$/
    and $1 eq "foo"
    and ($3 == 42 or $phrase eq "bar")
    # and so on

} readdir(...)

答案 2 :(得分:1)

听起来有点像你在描述查询功能。

#!/usr/bin/perl -Tw

use strict;
use warnings;
use Data::Dumper;

my ( $set_a, $set_b ) = query( 2, 'phrasethree', [ 1, 3 ] );

print Dumper( { a => $set_a, b => $set_b } );

# a) fetch elements which
#    1. match $phrase
#    2. exclude $speakerId
#    3. match @ids
# b) fetch elements which
#    1. match $phrase
#    2. match $speakerId
#    3. exclude @ids
sub query {
    my ( $speakerId, $passPhrase, $id_ra ) = @_;

    my %has_id = map { ( $_ => 0 ) } @{$id_ra};

    my ( @a, @b );

    while ( my $filename = glob '*.txt' ) {

        if ( $filename =~ m{\A ( \d+ )-( .+? )-( \d+ ) [.] txt \z}xms ) {

            my ( $_speakerId, $_passPhrase, $_id ) = ( $1, $2, $3 );

            if ( $_passPhrase eq $passPhrase ) {

                if ( $_speakerId ne $speakerId
                    && exists $has_id{$_id} )
                {
                    push @a, $filename;
                }

                if ( $_speakerId eq $speakerId
                    && !exists $has_id{$_id} )
                {
                    push @b, $filename;
                }
            }
        }
    }

    return ( \@a, \@b );
}