Question

我有一个Java正则表达式模式和一个我想完全匹配的句子，但对于一些句子它错误地失败了。为什么是这样？（为简单起见，我不会使用复杂的正则表达式，只是“。*”）

System.out.println(Pattern.matches(".*", "asdf"));
System.out.println(Pattern.matches(".*", "[11:04:34] <@Aimbotter> 1 more thing"));
System.out.println(Pattern.matches(".*", "[11:04:35] <@Aimbotter> Dialogue: 0,0:00:00.00,0:00:00.00,Default,{Orginal LV,0000,0000,0000,,[???]??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????} "));
System.out.println(Pattern.matches(".*", "[11:04:35] <@Aimbotter> Dialogue: 0,0:00:00.00,0:00:00.00,Default,{Orginal LV,0000,0000,0000,,[???]????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????} "));

输出：

true
true
true
false

请注意，第四个句子在问号之间包含10个unicode控制字符，这些字符不会被普通字体显示。第三和第四句实际上包含相同数量的字符！

Answer 1

使用

Pattern.compile(".*",Pattern.DOTALL)

如果你愿意的话。匹配控制字符。默认情况下，它仅匹配可打印字符。

来自JavaDoc：

“在dotall模式下，表达式。匹配任何字符，包括行终止符。默认情况下，此表达式与行终止符不匹配。

也可以通过嵌入式标志表达式（？s）启用Dotall模式。（s是“单行”模式的助记符，这是在Perl中调用的。）“

模式中的代码（有你的\ u0085）：

/**
 * Implements the Unicode category ALL and the dot metacharacter when
 * in dotall mode.
 */
static final class All extends CharProperty {
boolean isSatisfiedBy(int ch) {
    return true;
}
}

/**
 * Node class for the dot metacharacter when dotall is not enabled.
 */
static final class Dot extends CharProperty {
boolean isSatisfiedBy(int ch) {
    return (ch != '\n' && ch != '\r'
                && (ch|1) != '\u2029'
                && ch != '\u0085');
    }
}

Answer 2

答案在于：10个unicode控制字符\ u0085

unicode控制字符不能被。*识别，就像\ n

一样

Answer 3

Unicode / u0085是换行符 - 因此您必须将(?s) - 点匹配全部 - 添加到正则表达式的开头，或者在编译正则表达式时添加标记。

Pattern.matches("(?s).*", "blahDeBlah\u0085Blah")

Answer 4

我认为问题在于\ u0085代表换行符。如果您需要多行匹配，则需要使用Pattern.MULTILINE或Pattern.DOTALL。它不是Unicode的事实 - '\ n'也会失败。

使用它：Pattern.compile(regex, Pattern.DOTALL).matcher(input).matches()

Java正则表达式总是失败

4 个答案: