Question

所以我当时使用Jsoup爬网某些网页，并且发生了此有线问题。

使用的正则表达式

// Sets the prefix for all pages to prevent navigate to unwanted pages.
String prefix = "https://handbook.unimelb.edu.au/%d/subjects";
// Postfix for search page
String searchPostfix = "(\\?page=\\d+)?$";
// Postfix for subject page
String subjectPostfix = "\\/(\\w+)(\\/.+)?$";

String root = String.format(prefix, "2019");
String pattern = root.replace("/", "\\/").replace(".", "\\.");
Pattern reg1 = Pattern.compile("^" + pattern + searchPostfix);
Pattern reg2 = Pattern.compile("^" + pattern + subjectPostfix);

使用这些正则表达式模式。我用绳子跑了

String s1 = "https://handbook.unimelb.edu.au/2019/subjects/undergraduate";

并带有一种方法：

private String getSubjectCode(String link) {
    System.out.println(link);
    if (isSubjectPage(link)) {
        Matcher subjectMatcher = subjectPattern.matcher(link);
        System.out.println(link);
        // System.out.println(subjectMatcher.matches());   ## Exception if commented
        System.out.println(subjectMatcher.group(0));
        System.out.println(subjectMatcher.group(1));


        return subjectMatcher.group(1);
    }
    return null;
}

将会发生的事情是，如果我不加注释，则程序运行良好。

但是，如果我对此行发表评论

Exception in thread "main" java.lang.IllegalStateException: No match found
    at java.base/java.util.regex.Matcher.group(Matcher.java:645)
    at Page.Pages.getSubjectCode(Pages.java:54)
    at Page.Pages.enqueue(Pages.java:85)
    at Crawler.Crawler.parsePage(Crawler.java:41)
    at Crawler.Crawler.crawl(Crawler.java:51)
    at Main.main(Main.java:9)

将引发上述异常，为什么打印行会影响程序的运行方式？

而且，没有评论

System.out.println(subjectMatcher.matches());   // Exception if commented
// out -> true

Answer 1

造成差异的不是System.out.println，而是调用方法matches()的副作用。

这在JavaDocs of Matcher

中有解释

通过调用模式的matcher方法从模式创建匹配器。创建匹配器后，可将其用于执行三种不同类型的匹配操作：


matches方法尝试将整个输入序列与模式进行匹配。

lookingAt方法尝试将输入序列从开头开始与模式进行匹配。

find方法扫描输入序列，以查找与模式匹配的下一个子序列。

和

匹配器的显式状态最初是未定义的；尝试在成功匹配之前查询它的任何部分都将引发IllegalStateException。匹配器的显式状态由每个匹配操作重新计算。

您需要先呼叫matches，lookingAt或find，然后才能执行诸如group(0)之类的进一步查询。

System.out.print删除Matcher类异常

1 个答案: