正则表达式未能抓住所有比赛

时间:2016-06-26 12:26:07

标签: java regex

以下是一个例子:

The two (Senior Officer Stuart & Officer Jess) were intercepted by Officer George.

现在,让我们说我有两个级别"官员"和#34;高级官员"并希望 用一般标记" PERSON"替换它们后面的名称。如您所见,排名Stuart, Jess, George之后有三个名称。我不知道为什么我的正则表达式解决方案无法捕获所有这些解决方案。这是我的代码:

    public static void main(String[] args) {
    String input = "The two (Senior Officer Stuart & Officer Jess) were intercepted by Officer George.";
    ArrayList<String> ranks = new ArrayList<String>();
    ranks.add("Senior Officer");
    ranks.add("Officer");
    for (String rank : ranks) {
        Pattern pattern = Pattern.compile(".*" + rank + " ([a-zA-Z]*?) .*");
        Matcher m = pattern.matcher(input);
        if (m.find()) {
            System.out.println(rank);
            System.out.println(m.group(1));
        }
    }
}

这是它的输出:

Senior Officer
Stuart
Officer
Stuart
两次捕获斯图尔特(通过高级官员和官员),但忽略了杰斯和乔治。我希望将此作为输出:

Senior Officer
Stuart
Officer
Stuart
Officer
Jess
Officer
George

2 个答案:

答案 0 :(得分:2)

这就足够了

for (String rank : ranks) {
    Pattern pattern = Pattern.compile("\\b" + rank + "\\s+([a-zA-Z]*)");
    Matcher m = pattern.matcher(input);
    while (m.find()) {
        System.out.println(rank);
        System.out.println(m.group(1));
    }
}

<强> Ideone Demo

正则表达式细分(根据评论)

Officer #Match Officer literally
 ( #Capturing group
  (?: #Non-capturing group
    \s #Match space
     (?!(?:Senior\s+)?Officer) #Negative lookahead assures that its impossible to match the word Senior(which is optional) and Officer literally
    [A-Z][a-zA-Z]* #Match capital letter followed by combination of capital and small letter
  )* #Repeat the previous step any number of time till one of the condition of first letter being capital fails or word Officer is found
 )

答案 1 :(得分:0)

您使用的Sheet2只能找到每个等级的第一个匹配项。首先,您需要for内的while子句。

for

然而,这并没有解决找到&#34;高级官员的问题。排名两次:一次当你搜索&#34;高级军官&#34;一旦你搜索&#34;官员&#34;。我不确定你想怎么处理这个问题。如果你想让Stuart出现两次,那么这段代码就足够了。如果您只想检测一次Stuart,则需要使用您的regEx。

P.S。在编码前使用在线工具测试正则表达式。它节省了大量时间。