Question

我正在读取文件中的停用词，我将其保存在HashSet中。我将HashSet与String进行比较，以检查停用词。

如果我在String - 变量中放置一个单词，例如“the”，我的输出为“是”。但是，如果我输入类似“Apple is it”或“它是苹果”的内容，则输出为“No”，尽管两个String - 变量都包含停用词。

这是整个程序，包含两个方法，一个用于读取文件，另一个用于删除停用词：

private static HashSet<String> readFile(){
    Scanner x = null;
    HashSet<String> hset = new HashSet<String>();

    try {
        x = new Scanner(new File("StopWordsEnglish"));
        while(x.hasNext()){
            hset.add(x.next());
        }
    } catch(Exception e) {
        e.printStackTrace();
    } finally {
        x.close();
    }
    return hset;
}

public static void removeStopWords(){
    HashSet<String> hset = readFile();
    System.out.println(hset.size());
    System.out.println("Enter a word to search for: ");
    String search = "is";
    String s = search.toLowerCase();
    System.out.println(s);

    if (hset.contains(s)) {
        System.out.println("Yes");
    } else {
        System.out.println("No");
    }
}

Answer 1

我有一种感觉，我没有正确地阅读你的问题。但是这里有。

假设：

String search = "it is an apple";

然后你应该拆分字符串并单独检查每个单词。

String[] split = search.split(" ");
for (String s : split) {
if (hset.contains(s.toLowerCase()) {
    System.out.println("Yes");
    break; //no need to continue if a stop word is found
} else {
    System.out.println("No");
}

未正确检查停用词的字符串

1 个答案: