我有一个字符串列表,我想从这个列表中删除一些停用词:
for (int i = 0; i < simple_title.getItemCount(); i++) {
// split the phrase into the words
String str = simple_title.getItem(i);
String[] title_parts = str.split(" ");
ArrayList<String> list = new ArrayList<>(Arrays.asList(title_parts));
for (int k = 0; k < list.size(); k++) {
for (int l = 0; l < StopWords.stopwordslist.length; l++) {
// stopwordslist is a Static Variable in class StopWords
list.remove(StopWords.stopwordslist[l]);
}
}
title_parts = list.toArray(new String[0]);
for (String title_part : title_parts) {
// and here I want to print the string
System.out.println(title_part);
}
Arrays.fill(title_parts, null);
}
问题是在删除了停用词之后,我得到了title_part的唯一第一个索引,例如如果我有一个字符串列表,如:
list of strings
i am a list
is remove stop there list...
删除停止词之后我才得到:
list
list
remove
但我应该得到的是:
list strings
list
remove stop list
我一直在努力,但现在我很困惑,有人可以告诉我,我做错了吗?
答案 0 :(得分:1)
您正在从List
数组的迭代定义的索引处移除StopWords
中的项目!
所以删除是至少可以说是任意的,并且最终将取决于你的停止词的大小。
以下是您可能想要做的事情的自包含示例:
// defining the list of words (i.e. from your split)
List<String> listOfWords = new ArrayList<String>();
// adding some examples here (still comes from split in your case)
listOfWords.addAll(Arrays.asList("list", "of", "strings", "i", "am", "a", "list", "is", "remove", "stop", "there", "list"));
// defining an array of stop words (you probably want that as a constant somewhere else)
final String[] stopWords = {"of", "i", "am", "a", "is"};
// printing un-processed list
System.out.printf("Dirty: %s%n", listOfWords);
// invoking removeAll to remove all stop words
listOfWords.removeAll(Arrays.asList(stopWords));
// printing "clean" list
System.out.printf("Clean: %s%n", listOfWords);
<强>输出强>
Dirty: [list, of, strings, i, am, a, list, is, remove, stop, there, list]
Clean: [list, strings, list, remove, stop, there, list]