Question

假设我想匹配一个字符串，该字符串应该只包含符合特定（正则表达式）模式的部分，并在循环中检索元素。为此，似乎发明了Matcher.find()。但是，find将匹配任何字符串，而不仅仅是模式之后的字符串，因此会跳过中间字符。

所以 - 例如 - 我希望以下列方式匹配\\p{Xdigit}{2}（两个十六进制数字）：

aabb匹配;
_aabb不匹配;
aa_bb不匹配;
aabb_不匹配。

使用find（或对正则表达式的任何其他迭代调用），这样我就可以直接处理数组中的每个字节。所以我想在匹配后分别处理 aa和bb。

好吧，就是这样，最优雅的做法就是赢得接受。

注意：

十六进制解析只是一个简单重复模式的示例;
最好我想将正则表达式保持在匹配元素所需的最小值;
是的，我知道使用(\\p{XDigit}{2})*，但我不想扫描字符串两次（因为它应该可用于巨大的输入字符串）。

Answer 1

您希望获得出现在字符串开头或成功匹配后的所有（多个）匹配项。您可以将\G运算符与前瞻相结合，以确保字符串仅匹配某些重复模式。

使用

(?:\G(?!^)|^(?=(?:\p{XDigit}{2})*$))\p{XDigit}{2}

请参阅regex demo

<强>详情

(?: - 开始使用2个替代方案的非捕获组：
- \G(?!^) - 上一次成功比赛的结束
- | - 或
- ^(?=(?:\p{XDigit}{2})*$) - 字符串（^）的开头，后跟0 {+ 1}个\p{XDigit}{2}模式，直到字符串结尾（$）
) - 非捕获组的结束
\p{XDigit}{2} - 2个十六进制字符。

Java demo：

String regex = "(?:\\G(?!^)|^(?=(?:[0-9a-fA-F]{2})*$))[0-9a-fA-F]{2}";
String[] strings = {"aabb","_aabb","aa_bb", "aabb_"}; 
Pattern pattern = Pattern.compile(regex);
for (String s : strings) {
    System.out.println("Checking " + s);
    Matcher matcher = pattern.matcher(s);
    List<String> res = new ArrayList<>();
    while (matcher.find()) {
        res.add(matcher.group(0));
    }
    if (res.size() > 0) {
        System.out.println(res);
    } else {
        System.out.println("No match!");
    }
}

输出：

Checking aabb
[aa, bb]
Checking _aabb
No match!
Checking aa_bb
No match!
Checking aabb_
No match!

Answer 2

好吧，我可能最终有一个头脑风暴：想法是从while循环的条件中删除find()方法。相反，我应该只保留一个包含位置的变量，并且只在处理完整个字符串时停止解析。该位置还可用于生成更具信息性的错误消息。

该位置从零开始，并更新到匹配结束。每次找到新的匹配时，将匹配的开始与位置进行比较，即最后一次匹配的结束。如果出现错误：

找不到图案;

找到模式，但不是在最后一场比赛结束时。

代码：

private static byte[] parseHex(String hex){ byte[] bytes = new byte[hex.length() / 2]; int off = 0; // the pattern is normally a constant Pattern hexByte = Pattern.compile("\\p{XDigit}{2}"); Matcher hexByteMatcher = hexByte.matcher(hex); int loc = 0; // so here we would normally do the while (hexByteMatcher.find()) ... while (loc < hex.length()) { // optimization in case we have a maximum size of the pattern hexByteMatcher.region(loc, loc + 2); // instead we try and find the pattern, and produce an error if not found at the right location if (!hexByteMatcher.find() || hexByteMatcher.start() != loc) { // only a single throw, message includes location throw new IllegalArgumentException("Hex string invalid at offset " + loc); } // the processing of the pattern, in this case a double hex digit representing a byte value bytes[off++] = (byte) Integer.parseInt(hexByteMatcher.group(), 16); // set the next location to the end of the match loc = hexByteMatcher.end(); } return bytes; }

可以通过将\\G（最后一个匹配的结尾）添加到正则表达式来改进该方法：\\G\\p{XDigit}{2}：这样，如果无法在结尾处找到模式，则正则表达式将立即失败最后一场比赛或字符串的开头。）

对于具有预期最大大小（在本例中为2）的正则表达式，当然也可以调整需要匹配的区域的结尾。

查找和检索连续匹配

2 个答案: