Question

我有以下代码，花了我一些时间来运行。关于如何优化它以使其变得更好更快的任何建议？

                for (int tIndex = 0; tIndex < numTopics; tIndex++) {
                    double beta0 = sumTopicWordCount[tIndex] + betaSum;
                    int m0 = 0;
                    double expectWT = 1;
                    // getting the number of total words (or word w) in sentence i
                    List<String> sentenceStat = new ArrayList<String>();
                    for(int wIndex=0 ; wIndex<sentence.size() ; wIndex++){
                        sentenceStat.add(id2WordVocabulary.get(document.get(sIndex).get(wIndex)));
                    }
                    Set<String> unique = new HashSet<String>(sentenceStat);
                    for(String key : unique){
                        int cnt = Collections.frequency(sentenceStat, key);
                        double betaw = topicWordCount[tIndex][word2IdVocabulary.get(key)] + beta;
                        for (int m = 0; m < cnt; m++) {
                            expectWT *= (betaw + m) / (beta0 + m0);
                            m0++;
                        }
                    }
                    multiPros[tIndex] = (docTopicCount[sIndex][tIndex] + alpha) * expectWT;
                }

Answer 1

问题是您在循环中反复扫描数据：Collections.frequency一遍又一遍地扫描整个列表。

您可以一次性计算它们，而不仅仅列出唯一元素。我假设Java 5-7在下面;在Java 8中，它会缩短，也许更快。

   Map<String, Integer> unique = new HashMap<String, Integer>();
   for(String s: sentenceStat) {
       Integer cnt = unique.get(s);
       if (cnt == null) {
           unique.put(s, 1);
       } else {
           unique.put(s, cnt + 1);
       }
   }
   for(Map.Entry<String, Integer> key : unique.entrySet()){
       String key = entry.getKey();
       int cnt = entry.getValue();
       double betaw = topicWordCount[tIndex][word2IdVocabulary.get(key)] + beta;
       for (int m = 0; m < cnt; m++) {
           expectWT *= (betaw + m) / (beta0 + m0);
           m0++;
       }
   }

如何优化java代码 - 运行时

1 个答案: