Perl正则表达式字符串或字符串

时间:2015-04-10 18:43:48

标签: regex perl

我的查询可以措辞如下:

"Who is King?"
"Who was King?"

然后有一个可选的" a"或""。

"Who is the King?"
"Who is a King?"
"Who was the King?"
"Who was a King?"

我试图捕捉"""或"是"使用可选的" a"或"",在这种情况下,它将是' King'。

if($input =~ /[is|was]\s[the|a]?(.*)\?/g)
{
    $searchTerm = $1;
}

这一切都有效,除非我添加""我得到he King

似乎正在发生的事情是我的?正在追赶'然后继续前进,因为我有0或1。但我假设[the|a]?会匹配0或1个''或者' a'。

有人可以帮助这个正则表达式吗?

2 个答案:

答案 0 :(得分:4)

要指定两个或更多选项的匹配项,请将选项与交替符号分开。

the|a

要指定替换的边界或范围,必须用括号括起替代选项。

(the|a)

括号内的任何内容也会导致它成为捕获组。要指定非捕获组,请在左括号后添加?:

(?:the|a)

除了“the”和“a”之外,你很可能也想要“a”这个词。

(?:the|a|an)

由于该字词是可选的(它可以显示一次或根本不显示),您需要在该组之后放置?

(?:the|a|an)?

此外,由于单词是可选的,因此前面的空格也应该是可选的(但如果单词存在,则至少应该有一个空格)。

(\s+(?:the|a|an))?

最后,我们刚创建的用于考虑领先空间的新组也应指定为非捕获。

(?:\s+(?:the|a|an))?

这是一个成功解析您的示例的脚本(以及使用可选单词“an”的我自己的示例):

#!/usr/bin/env perl

use strict;
use warnings;

while (my $input = <DATA>) {
  chomp $input;
  if ( my($subject) = $input =~ /\s+(?:is|was)(?:\s+(?:the|a|an))?\s+(.+)\?/ ) {
    print "$input: [$subject]\n";
  }
}

__DATA__
Who was King?
Who is King?
Who is the King?
Who is a King?
Who was the King?
Who was a King?
Who is Ace?
Who is the Ace?
Who is an Ace?
Who was Ace?
Who was the Ace?
Who was an Ace?

输出:

Who was King?: [King]
Who is King?: [King]
Who is the King?: [King]
Who is a King?: [King]
Who was the King?: [King]
Who was a King?: [King]
Who is Ace?: [Ace]
Who is the Ace?: [Ace]
Who is an Ace?: [Ace]
Who was Ace?: [Ace]
Who was the Ace?: [Ace]
Who was an Ace?: [Ace]

答案 1 :(得分:2)

你的轮换错了。您想要(is|was),而不是[is|was]

您也不需要/g,因为您不在循环中。

你的正则表达式应该如下:

if ( $input =~ /"(.+)\s+(is|was)\s+(the|a)\s+(.+)\?"/ ) {
    my $pronoun = $1;
    my $is_was  = $2;
    my $the_a   = $3;
    my $what    = $4;
}