所以我有一个.txt文件,我正在调用
String[] data = loadStrings("data/data.txt");
该文件已经排序,基本上如下:
Animal
Animal
Cat
Cat
Cat
Dog
我希望创建一个算法来计算java中的排序列表,而不使用Multisets之类的任何库或不使用Maps / HashMaps。到目前为止,我已经设法打印出最常出现的单词:
ArrayList<String> words = new ArrayList();
int[] occurrence = new int[2000];
Arrays.sort(data);
for (int i = 0; i < data.length; i ++ ) {
words.add(data[i]); //Put each word into the words ArrayList
}
for(int i =0; i<data.length; i++) {
occurrence[i] =0;
for(int j=i+1; j<data.length; j++) {
if(data[i].equals(data[j])) {
occurrence[i] = occurrence[i]+1;
}
}
}
int max = 0;
String most_talked ="";
for(int i =0;i<data.length;i++) {
if(occurrence[i]>max) {
max = occurrence[i];
most_talked = data[i];
}
}
println("The most talked keyword is " + most_talked + " occuring " + max + " times.");
我想要的不仅仅是获得最高的词,也许是前5或前10。 希望足够清楚。感谢您的阅读
答案 0 :(得分:1)
如果你不能使用Guava的Multiset,那么你可以自己实现一个等价物。基本上,您只需要创建一个Map<String, Integer>
,它可以跟踪每个单词(键)的计数(值)。这意味着要改变这个
ArrayList<String> words = new ArrayList<String>();
// ...
for (int i = 0; i < data.length; i ++ ) {
words.add(data[i]); //Put each word into the words ArrayList
}
进入这个:
Map<String, Integer> words = new HashMap<String>();
// ...
for (String word : data) {
Integer count = words.get(word);
words.put(word, (count != null : count.intValue() + 1 ? 1));
}
在您填写地图后,只需sort it by the values。
如果您也不能使用Map
,则可以执行以下操作:
首先,为您的字数创建一个包装类:
public class WordCount implements Comparable<WordCount> {
private String word;
private int count;
public WordCount(String w, int c) {
this.word = w;
this.count = c;
}
public String getWord() {
return word;
}
public int getCount() {
return count;
}
public void incrementCount() {
count++;
}
@Override
public int compareTo(WordCount other) {
return this.count - other.count;
}
}
然后,将代码更改为在列表中存储WordCount
个实例(而不是String
s):
ArrayList<WordCount> words = new ArrayList<WordCount>();
// ...
for (String word : data) {
WordCount wc = new WordCount(word, 1);
boolean wordFound = false;
for (WordCount existing : words) {
if (existing.getWord().equals(wc.getWord())) {
existing.incrementCount();
wordFound = true;
break;
}
}
if (!wordFound) {
words.add(wc);
}
}
最后,在填充List
后,只需使用Collections.sort()
对其进行排序即可。这很容易,因为值对象实现了Comparable
:
Collections.sort(words, Collections.reverseOrder());
答案 1 :(得分:1)
既然你说你不想使用某种数据结构我认为你可以做这样的事情,但它不是高效的。 我通常喜欢存储索引而不是值。
ArrayList<String> words = new ArrayList();
int[] occurrence = new int[2000];
Arrays.sort(data);
int nwords = 0;
occurrence[nwords]=1;
words.add(data[0]);
for (int i = 1; i < data.length; i ++ ) {
if(!data[i].equals(data[i-1])){ //if a new word is found
words.add(data[i]); //put it into the words ArrayList
nwords++; //increment the index
occurrence[nwords]=0; //initialize its occurrence counter
}
occurrence[nwords]++; //increment the occurrence counter
}
int max;
for(int k=0; k<5; k++){ //loop to find 5 times the most talked word
max = 0; //index of the most talked word
for(int i = 1; i<words.size(); i++) { //for every word
if(occurrence[i]>occurrence[max]) { //if it is more talked than max
max = i; //than it is the new most talked
}
}
println("The most talked keyword is " + words.get(max) + " occuring " + occurence[max] + " times.");
occurence[max]=0;
}
每当我找到具有较高出现值的值时,我将其出现计数器设置为0并再次重复该数组,这将持续5次。
答案 2 :(得分:0)
你可以尝试这样简单的东西......
int count = 0;
for( int i = 0; i < words.size(); i++ ){
System.out.printf("%s: ", words.get( i ));
for( int j = 0; j < words.size(); j++ ) {
if( words.get( i ).equals( words.get( j ) ) )
count++;
}
System.out.printf( "%d\n", count );
}