Question

使用匹配器 find()方法时，部分匹配会返回false，但匹配器的位置仍会移动。 find()的后续调用省略了那些部分匹配的字符。

部分匹配的示例：针对输入"[0-9]+:[0-9]"的模式"a3;9"。此模式与输入的任何部分都不匹配，因此find()返回false，但子模式"[0-9]+"与"3"匹配。如果我们此时更改模式并再次调用find()，则不会测试新匹配项左侧的字符，包括部分匹配项。

请注意，模式"[0-9]:[0-9]"（没有量词）不会产生这种效果。

这是正常行为吗？

示例：在第一个for循环中，第三个模式[0-9]与字符"9"匹配，而"3"未报告为匹配。在第二个循环中，模式[0-9]与字符"3"匹配。

import java.util.regex.*;

public class Test {
    public static void main(String[] args) {
        final String INPUT = "a3;9";
        String[] patterns = {"a", "[0-9]+:[0-9]", "[0-9]"};

        Matcher matcher = Pattern.compile(".*").matcher(INPUT);

        System.out.printf("Input: %s%n", INPUT);
        matcher.reset();
        for (String s: patterns)
            testPattern(matcher, s);

        System.out.println("=======================================");

        patterns = new String[] {"a", "[0-9]:[0-9]", "[0-9]"};
        matcher.reset();
        for (String s: patterns)
            testPattern(matcher, s);
    }

    static void testPattern(Matcher m, String re) {     
        m.usePattern(Pattern.compile(re));
        System.out.printf("Using regex: %s%n", m.pattern().toString());

        // Testing for pattern
        if(m.find())
            System.out.printf("Found %s, end-pos: %d%n", m.group(), m.end());
    }
}

Answer 1

Matcher提出了三种不同类型的匹配操作（参见javadoc） - matches表示整个输入匹配 - find对于遍历跳过无法比拟 - lookingAt从序列开始部分匹配

当lookingAt调用matcher.region(matcher.end(), matcher.regionEnd())找到模式时，可以将其用于连续模式。

（大部分功劳归功于OP自我）

Answer 2

根据Javadoc of Matcher#usePattern：

此方法会导致此匹配器丢失有关上次匹配的组的信息。 匹配器在输入中的位置保持不变，其最后一个附加位置不受影响。

因此，根据此文档usePattern，只保证丢失有关最后一场比赛的组的信息。 Matcher类中的所有其他状态信息不会在此方法中重置。

这是usePattern方法中的实际代码，表明它只是初始化组：

public Matcher usePattern(Pattern newPattern) {
    if (newPattern == null)
        throw new IllegalArgumentException("Pattern cannot be null");
    parentPattern = newPattern;

    // Reallocate state storage
    int parentGroupCount = Math.max(newPattern.capturingGroupCount, 10);
    groups = new int[parentGroupCount * 2];
    locals = new int[newPattern.localCount];
    for (int i = 0; i < groups.length; i++)
        groups[i] = -1;
    for (int i = 0; i < locals.length; i++)
        locals[i] = -1;
    return this;
}

请注意，Matcher类具有未使用任何公共方法公开的私有变量first和last。如果我们使用reflection API，那么我们可以看到这里出现问题的证据。

检查此代码块：

public class UseMatcher {
    final static String INPUT = "a3#9";
    static Matcher m = Pattern.compile("").matcher("");

    public static void main(String[] args) throws Exception {
        executePatterns(new String[] {"a", "[0-9]+:[0-9]", "[0-9]"});
        executePatterns(new String[] {"a", "[0-9]:[0-9]", "[0-9]"});
    }

    static void executePatterns(String[] patterns) throws Exception {
        System.out.printf("================= \"%s\" ======================%n", INPUT);
        m.reset(INPUT);

        boolean found = false;
        for (String re: patterns) {
            m.usePattern(Pattern.compile(re));
            System.out.printf("first/last: %s/%s, Using regex: \"%s\"%n",
                   matcherField("first"), matcherField("last"), m.pattern());

            found = m.find();
            if (found) {
                System.out.printf("Found %s, end-pos: %d%n", m.group(), m.end());
            }
        }
    }

    static Object matcherField(String fieldName) throws Exception {
        Field field = m.getClass().getDeclaredField(fieldName);    
        field.setAccessible(true);
        return field.get(m);
    }
}

<强>输出：

================= "a3#9" ======================
first/last: -1/0, Using regex: "a"
Found a, end-pos: 1
first/last: 0/1, Using regex: "[0-9]+:[0-9]"
first/last: -1/2, Using regex: "[0-9]"
Found 9, end-pos: 4
================= "a3#9" ======================
first/last: -1/0, Using regex: "a"
Found a, end-pos: 1
first/last: 0/1, Using regex: "[0-9]:[0-9]"
first/last: -1/1, Using regex: "[0-9]"
Found 3, end-pos: 2

在应用模式first/last和"[0-9]+:[0-9]"后检查"[0-9]:[0-9]"位置的差异。在第一种情况下，last变为2，而在第二种情况下，last仍为1。因此，下次拨打find()时，我们会收到不同的匹配，即9 vs 3。

FIX

由于我明显matcher未在last的每次通话中重置usePattern位置，因此我们可以调用overloaded find(int Start) method并提供上次成功{{1}的结束位置方法调用。

find

当我们从上面显示的相同static void executePatterns(String[] patterns) throws Exception { System.out.printf("================= \"%s\" ======================%n", INPUT); m.reset(INPUT); boolean found = false; int nextStart = 0; for (String re: patterns) { m.usePattern(Pattern.compile(re)); System.out.printf("first/last: %s/%s, Using regex: \"%s\"%n", matcherField("first"), matcherField("last"), m.pattern()); found = m.find(nextStart); if (found) { System.out.printf("Found %s, end-pos: %d%n", m.group(), m.end()); nextStart = m.end(); } } }方法调用它时，我们将得到以下输出：

main

即使此输出仍然显示与上一个输出相同的================= "a3#9" ====================== first/last: -1/0, Using regex: "a" Found a, end-pos: 1 first/last: 0/1, Using regex: "[0-9]+:[0-9]" first/last: -1/2, Using regex: "[0-9]" Found 3, end-pos: 2 ================= "a3#9" ====================== first/last: -1/0, Using regex: "a" Found a, end-pos: 1 first/last: 0/1, Using regex: "[0-9]:[0-9]" first/last: -1/0, Using regex: "[0-9]" Found 3, end-pos: 2位置，但由于使用了{，它确实使用2种不同的模式找到了正确的子串first/last {1}}方法。

Here is Code demo for working fix

部分匹配会改变匹配器的位置

2 个答案:

FIX