我必须从txt文件中获取一些StopWord并将其从文本中删除。 我使用这种方法从File中获取StopWords,将它们保存在String数组中并返回:
public String[] loadStopwords(File targetFile, String[] stopWords) throws IOException {
File fileTo = new File(targetFile.toString());
BufferedReader br;
List<String> lines = new ArrayList<String>();
try {
br = new BufferedReader(new FileReader(fileTo));
String st;
while((st=br.readLine()) != null){
lines.add(st);
}
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
stopWords = lines.toArray(new String[]{});
return stopWords;
}
然后,我传递StopWords []和要在其中更新的文本:
public void removeStopWords(String targetText, String[] stopwords) {
targetText = targetText.toLowerCase().trim();
ArrayList<String> wordList = new ArrayList<>();
wordList.addAll(Arrays.asList(targetText.split(" ")));
List<String> stopWordsList = new ArrayList<>();
stopWordsList.addAll(Arrays.asList(stopwords));
wordList.removeAll(stopWordsList);
}
但不会从 wordList 中删除任何内容。为什么?
答案 0 :(得分:1)
尝试也将停用词保存为小写:
public String[] loadStopwords(String targetFile) throws IOException {
File fileTo = new File(targetFile);
BufferedReader br;
List<String> lines = new ArrayList<>();
try {
br = new BufferedReader(new FileReader(fileTo));
String st;
while((st=br.readLine()) != null){
//Adding words en lowercase and without start end blanks
lines.add(st.toLowerCase().trim);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
return lines.toArray(new String[]{});
}
public ArrayList<String> removeStopWords(String targetText, String[] stopwords) {
//Make the text to LowerCase also
targetText = targetText.toLowerCase().trim();
ArrayList<String> wordList = new ArrayList<>();
wordList.addAll(Arrays.asList(targetText.split(" ")));
List<String> stopWordsList = new ArrayList<>();
stopWordsList.addAll(Arrays.asList(stopwords));
wordList.removeAll(stopWordsList);
return wordList;
}
答案 1 :(得分:0)
Edoardo
那确实对我有用。但是,有一些评论:
查看您的评论,我怀疑区别在于停用词文本文件。我让我的每个停用词都换行了,而您很可能将所有停用词都放在了一行上,而您并没有将它们分开。