将正则表达式组匹配到java(Hearst Pattern)列表

时间:2013-12-02 23:50:13

标签: java regex regex-group

我正在尝试将Hearst-Patterns与Java正则表达式匹配这是我的正则表达式:

<np>(\w+)<\/np> such as (?:(?:, | or | and )?<np>(\w+)<\/np>)*

如果我有一个带注释的句子,如:

I have a <np>car</np> such as <np>BMW</np>, <np>Audi</np> or <np>Mercedes</np> and this can drive fast.

我想得到这些小组:

1. car
2. [BMW, Audi, Mercedes]

更新:这是我当前的java代码:

Pattern pattern = Pattern.compile("<np>(\\w+)<\\/np> such as (?:(?:, | or | and )?<np>(\\w+)<\\/np>)*");
Matcher matcher = pattern.matcher("I have a <np>car</np> such as <np>BMW</np>, <np>Audi</np> or <np>Mercedes</np> and this can drive fast.");

while (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
}

但是第二组元素只包含Mercedes,如何获得第二组的所有匹配(maby作为数组)?这可能是java PatternMatcher吗?如果是的话,我的错误是什么?

1 个答案:

答案 0 :(得分:2)

如果你想确保有连续的结果,你可以使用强制匹配的\G锚点与先前的匹配相邻:

Pattern p = Pattern.compile("<np>(\\w+)</np> such as|\\G(?:,| or| and)? <np>(\\w+)</np>");

注意:\G锚意味着先前匹配的结束或字符串的开头。为避免匹配字符串的开头,您可以在(?<!^)

之后添加lookbehind \G