如何使用Matcher单独替换每个组?

时间:2014-02-25 13:59:50

标签: java regex

我发现的所有例子都是人们有一些正则表达式的情况 搜索并需要替换找到的具有某些特定值的所有组,或者搜索到的字符串中已知数量的组。

但在我的情况下,我需要根据找到的值更改每个组,如何更改每个更改的结果值?

这就是我所拥有/尝试过的:

 Pattern pattern = Pattern.compile(DEFINITION_WITH_OR);
 Matcher matcher = pattern.matcher(s);
 StringBuffer sb = new StringBuffer();
  while (matcher.find()){

  String ss = matcher.group();

    /*Some string manupilation*/

  // matcher.appendReplacement(sb, bestMatchedDefinition);
  // matcher.appendReplacement(sb,Matcher.quoteReplacement(ss));
  // s = s.replace(s.substring(matcher.start(),matcher.end()),ss);

}

我希望做的是遍历找到的所有群组,对找到的群组执行某些操作,并仅编辑该群组,内容和内容。在运行之前不知道组的数量。

到目前为止,我的所有尝试都改变了一切或根本没改变,有什么建议吗?

我对字符串的处理是由|拆分,得到最短的部分,然后删除括号: 示例输入字符串: 注意:以下输入字符串是一个简化,以显示我的最终结果应该是什么,完整的字符串有更多烦人的字符我使用DEFINITION_WITH_OR模式清除

 a commissioned general officer in the United States Army,
 [[United States Marine Corps|Marine Corps]],
 or [[United States Air Force|Air Force]] superior to a lieutenant general.
 A general is equal in rank or grade to a four star admiral. In the US Army,
 a general is junior to a general of the army. In the US Marine Corps,
 a general is the highest rank of commissioned officer. In the US Air Force,
 a general is junior to a general of the air force.

应输出为:

 a commissioned general officer in the United States Army,
 Marine Corps,
 or Air Force superior to a lieutenant general.
 A general is equal in rank or grade to a four star admiral. In the US Army,
 a general is junior to a general of the army. In the US Marine Corps,
 a general is the highest rank of commissioned officer. In the US Air Force,
 a general is junior to a general of the air force.

请注意空军海军陆战队位。

2 个答案:

答案 0 :(得分:1)

    String source = "a commissioned general officer in the United States Army, "
            + "[[United States Marine Corps|Marine Corps]], "
            + "or [[United States Air Force|Air Force]] superior to a lieutenant general.";
    Pattern pattern = Pattern.compile("\\[\\[(.*?)\\]\\]");
    Matcher m = pattern.matcher(source);
    StringBuffer sb = new StringBuffer();
    while (m.find()) {
        String[] terms = m.group(1).split("\\|");
        String shortestTerm = null;
        for (String term : terms) {
            if (shortestTerm == null || term.length() < shortestTerm.length()) {
                shortestTerm = term;
            }
        }
        m.appendReplacement(sb, shortestTerm);
    }
    m.appendTail(sb);
    String target = sb.toString();
    System.out.println(target);

请注意虚假的反斜杠。 ".*?"采用最短的序列匹配。

答案 1 :(得分:0)

好吧,多亏了Joop的回答,我意识到我没有添加以下代码:

matcher.appendTail(sb);
s = sb.toString();

在while循环之后,行matcher.appendReplacement(sb,Matcher.quoteReplacement(ss));确实做到了。 出于某种原因,matcher.appendReplacement(sb,ss);也起作用,但速度要慢得多。如果有人知道为什么并且可以发表评论那就太好了。