Question

以下代码会尝试检查searchWords中的所有字词是否都显示在newsPaperWords中。两个列表都可以包含重复项。如果一个单词在searchWords中出现n次，则它必须在newsPaperWords中至少出现n次才能使该方法返回true。我认为时间复杂度是2*O(n) + O(m)，但是面试官告诉我它是2*O(n log n) + O(m log m)。

/**
 * @param searchWords The words we're looking for. Can contain duplicates
 * @param newsPaperWords  The list to look into
 */
public boolean wordMatch(List<String> searchWords, List<String> newsPaperWords) {
    Map<String, Integer> searchWordCount = getWordCountMap(searchWords);
    Map<String, Integer> newspaperWordCount = getWordCountMap(newsPaperWords);
    for (Map.Entry<String, Integer> searchEntry : searchWordCount.entrySet()) {
        Integer occurrencesInNewspaper = newspaperWordCount.get(searchEntry.getKey());
        if (occurrencesInNewspaper == null || occurrencesInNewspaper < searchEntry.getValue()) {
            return false;
        }
    }
    return true;
}

private Map<String, Integer> getWordCountMap(List<String> words) {
    Map<String, Integer> result = new HashMap<>();
    for (String word : words) {
        Integer occurrencesThisWord = result.get(word);
        if (occurrencesThisWord == null) {
            result.put(word, 1);
        } else {
            result.put(word, occurrencesThisWord + 1);
        }
    }
    return result;
}

正如我所看到的，该方法的时间复杂度为2*O(n) + O(m)（n为searchWords中的元素数量，m为newsPaperWords中的元素数量）：

方法getWordCountMap()的复杂度为O(n)，n是给定列表中元素的数量。该方法循环列表一次，并假设对result.get(word)和result.put()的调用是O(1)。
然后，searchWordCount.entrySet()上的迭代是最坏情况O(n)，再次假设对Hashmap.get()的调用是O(1)。

因此，只需添加O(n) + O(m)即可构建两张地图以及O(n)以供最后查看。

在阅读this answer后，将O(n)作为HashMap.get()的最坏情况复杂度，我可以理解getWordCountMap()的复杂性最高可达O(n*2n) O(n*n)的最后一个循环，总复杂度为O(n*2n) + O(m*2m) + O(n*n)。

但它是如何2*O(n log n) + O(m log m)？

Answer 1

由于JEP 180: Handle Frequent HashMap Collisions with Balanced Trees，HashMap.get()操作的最差情况将为O(log n)。引用JEP 180：

主要思想是，一旦哈希桶中的项目数量增长超过某个阈值，该桶就会从使用链接的条目列表切换到平衡树。在高哈希冲突的情况下，这将改善从O（n）到O（log n）的最坏情况性能。

这将使getWordCountMap()方法O(n log n)成为可能。

Answer 2

假设散列图使用正确的散列函数，您推导出的复杂性是正确的。这个算法对我来说就像 O（m + n）。

我想你的采访者描述了解决这个问题的另一种方法的复杂性，这种方法更耗时但最终占用的空间更少。

为什么这种方法的时间复杂度为2 * O（n log n）+ O（m log m）？

2 个答案: