Question

我试图使用Java Scanner hasNext方法，但结果很奇怪。也许我的问题非常明显，但为什么这个简单的简单表达式"[a-zA-Z']+"不适用于这样的词：“指向。任何东西，主管”。我也试过这个"[\\w']+"。

public HashMap<String, Integer> getDocumentWordStructureFromPath(File file) {
    HashMap<String, Integer> dictionary = new HashMap<>();
    try {
        Scanner lineScanner = new Scanner(file);
        while (lineScanner.hasNextLine()) {
            Scanner scanner = new Scanner(lineScanner.nextLine());
            while (scanner.hasNext("[\\w']+")) {
                String word = scanner.next().toLowerCase();
                if (word.length() > 2) {
                    int count = dictionary.containsKey(word) ? dictionary.get(word).intValue() + 1 : 1;
                    dictionary.put(word, new Integer(count));
                }
            }
            scanner.close();
        }
        //scanner.useDelimiter(DELIMITER);
        lineScanner.close();

        return dictionary;

    } catch (FileNotFoundException e) { 
        e.printStackTrace();
        return null;
    }   
}

Answer 1

您的正则表达式应该与此[^a-zA-z]+类似，因为您需要将所有不是字母的内容分开：

// previous code...
Scanner scanner = new Scanner(lineScanner.nextLine()).useDelimiter("[^a-zA-z]+");
    while (scanner.hasNext()) {
        String word = scanner.next().toLowerCase();
        // ...your other code
    }
}
// ... after code

编辑 - 为什么不使用hasNext（String）方法??

这一行：

Scanner scanner = new Scanner(lineScanner.nextLine());

它真正做的是为你编译一个whitespce模式，所以如果你有这个测试行"Hello World. A test, ok."它将为你提供这个标记：

您好
世界。
A
测试，
确定。

然后，如果您使用scanner.hasNext("[a-ZA-Z]+")，则会询问扫描程序if there is a token that match your pattern，对于此示例，它将为第一个标记声明true：

您好（因为这是与您指定的模式匹配的frist标记）

对于下一个标记（世界。）it doesn't match the pattern所以它只会fail而scanner.hasNext("[a-ZA-Z]+")会返回false所以它永远不会有效对于任何不是字母的字符开头的单词。你明白了吗？

现在......希望这会有所帮助。

Java Scanner hasNext（String）方法有时不匹配

1 个答案: