我有以下代码,花了我一些时间来运行。关于如何优化它以使其变得更好更快的任何建议?
for (int tIndex = 0; tIndex < numTopics; tIndex++) {
double beta0 = sumTopicWordCount[tIndex] + betaSum;
int m0 = 0;
double expectWT = 1;
// getting the number of total words (or word w) in sentence i
List<String> sentenceStat = new ArrayList<String>();
for(int wIndex=0 ; wIndex<sentence.size() ; wIndex++){
sentenceStat.add(id2WordVocabulary.get(document.get(sIndex).get(wIndex)));
}
Set<String> unique = new HashSet<String>(sentenceStat);
for(String key : unique){
int cnt = Collections.frequency(sentenceStat, key);
double betaw = topicWordCount[tIndex][word2IdVocabulary.get(key)] + beta;
for (int m = 0; m < cnt; m++) {
expectWT *= (betaw + m) / (beta0 + m0);
m0++;
}
}
multiPros[tIndex] = (docTopicCount[sIndex][tIndex] + alpha) * expectWT;
}
答案 0 :(得分:0)
问题是您在循环中反复扫描数据:Collections.frequency
一遍又一遍地扫描整个列表。
您可以一次性计算它们,而不仅仅列出唯一元素。我假设Java 5-7在下面;在Java 8中,它会缩短,也许更快。
Map<String, Integer> unique = new HashMap<String, Integer>();
for(String s: sentenceStat) {
Integer cnt = unique.get(s);
if (cnt == null) {
unique.put(s, 1);
} else {
unique.put(s, cnt + 1);
}
}
for(Map.Entry<String, Integer> key : unique.entrySet()){
String key = entry.getKey();
int cnt = entry.getValue();
double betaw = topicWordCount[tIndex][word2IdVocabulary.get(key)] + beta;
for (int m = 0; m < cnt; m++) {
expectWT *= (betaw + m) / (beta0 + m0);
m0++;
}
}