Java中不寻常的正则表达式Bug

时间:2016-06-29 07:09:27

标签: java regex

我遇到了Regex的问题。所以我想要做的是在搜索结果组的开头和结尾添加字符。但是我得到了一个奇怪的错误。

这是控制台日志:

Start: 0
End: 25
Start: 24
End: 80
C"Hey Topaz how are you?C"D Nathan continues to take a seat and proceeds with, "DI hate that guy"

如您所见,Start为0,第一个为End为25。然而对于下一个,开始是24.我不想要它。它应该是下一个引用组。无论如何这里是代码:

    StringBuilder message = new StringBuilder("\"Hey Topaz how are you?\" Nathan continues to take a seat and proceeds with, \"I hate that guy\"");

    Matcher matcher = quotationPattern.matcher(message);

    while (matcher.find())
    {
        int startIndex = matcher.start();
        int endIndex = matcher.end() + 1;

        System.out.println("Start: " + startIndex);
        System.out.println("End: " + endIndex);
        message.insert(startIndex, 'C');
        message.insert(endIndex, 'D');

    }

    System.out.println(message);
我差点忘了。这是模式。

protected static Pattern quotationPattern = Pattern.compile("\"(?:\\\\.|[^\"\\\\])*\"");

感谢任何帮助。

2 个答案:

答案 0 :(得分:1)

final Pattern p = Pattern.compile("\"(?:\\\\.|[^\"\\\\])*\"");
StringBuilder message = new StringBuilder("\"Hey Topaz how are you?\" Nathan continues to take a seat and proceeds with, \"I hate that guy\"");
final Matcher m = p.matcher(message);
int cnt = 0;
while (m.find()) {
    System.out.println(m.group(0));
}

所以,你可以看到这是有效的。您的代码存在的问题是,您正在修改原始搜索字符串,因此在修改字符串后使用m.find()再次搜索时会出现问题!

这适用于您的情况:

final Pattern p = Pattern.compile("\"(?:\\\\.|[^\"\\\\])*\"");
String testString = "\"Hey Topaz how are you?\" Nathan continues to take a seat and proceeds with, \"I hate that guy\"";
StringBuilder message = new StringBuilder(testString);
final Matcher m = p.matcher(testString);
int replaceCount = 0;
while (m.find()) {
        int startIndex = m.start();
        int endIndex = m.end() + 1;
        System.out.println("Start: " + startIndex);
        System.out.println("End: " + endIndex);
        message.insert(startIndex + replaceCount*2, 'C');
        message.insert(endIndex + replaceCount*2, 'D');
        replaceCount += 1;
}
System.out.println(message);

Check here

由于您正在修改String(在我的情况下是testString),因此Matcher返回的索引对于您修改的String(testString)将是不同的。因此,请记录您添加了多少次C' C'和' D'并按该计数增加startIndexendIndex

答案 1 :(得分:1)

问题来自:

message.insert(startIndex, 'C');
message.insert(endIndex, 'D');

这会更改message中的内容并影响Matcher#find()的结果。删除这些语句后,它应该可以工作。

首选方法是向matcher提供不可变的字符串输入。您可以使用StringBuilder#toString返回新的String并将其放入matcher,以避免副作用来自message

Matcher matcher = quotationPattern.matcher(message.toString());