Question

我们有一个长度为N且数字为X的字符串。

如何在平均O（N）时间内找到长度为N的字符串中长度为X的最频繁的子字符串？

我认为，这是一个类似的问题：https://stackoverflow.com/questions/1597025?tab=votes#tab-top

我想问你如何证明使用的散列函数的数量只是一个常数。

Answer 1

suffix tree应该在O（n）时间最坏的情况下给出这个，使用O（n）空间。

特别检查上面wiki页面的Functionality部分，字符串属性子部分，提及

在Θ（n）时间内找出最常出现的最小长度子串。

Answer 2

我建议使用这种哈希函数。让我们看一下，每个字符串的数字都是256基本表示法（而不是我们基于10的表示法）。因此，对于每个X长度的子字符串，我们可以用这种方式得到10个基本表示法的值：

#include <iostream>
#include <string>
#include <map>
#include <algorithm>


int main()
{
    std::string s;
    int x;
    std::cin >> s >> x;

    unsigned const int base = 256;
    unsigned long long xPowOfBase = 1;
    int i = 0;
    for(i = 1; i <= x; ++i)
        xPowOfBase *= base;

    unsigned long long firstXLengthSubString = 0;
    for(i = 0; i < x; ++i)
    {
        firstXLengthSubString *= base;
        firstXLengthSubString += s[i];
    }

    unsigned long long nextXLengthSubstring = firstXLengthSubString;

    std::map<unsigned long long, std::pair<int, int> > hashTable;
    for(;i <= s.size(); ++i)
    {
        if(hashTable.find(nextXLengthSubstring) != hashTable.end())
            ++hashTable[nextXLengthSubstring].first;
        else
            hashTable.insert(std::make_pair(nextXLengthSubstring, std::make_pair(1, i - x)));

        if(i != s.size())
        {
            nextXLengthSubstring *= base;
            nextXLengthSubstring += s[i];
            nextXLengthSubstring -= s[i - x] * xPowOfBase;
        }
    }

    std::map<unsigned long long, std::pair<int, int> >::iterator it = hashTable.begin();
    std::map<unsigned long long, std::pair<int, int> >::iterator end_it = hashTable.end();
    std::pair<int, int> maxCountAndFirstPosition = std::make_pair(0, -1);

    for(;it != end_it; ++it)
    {
        if(maxCountAndFirstPosition.first < it->second.first)
            maxCountAndFirstPosition = it->second;
    }

    std::cout << maxCountAndFirstPosition.first << std::endl;
    std::cout << s.substr(maxCountAndFirstPosition.second, x) << std::endl;
    return 0;
}

这将适用于O(n * log(n))，使其O(n)只需更改任何哈希表的std :: map。

长度为X的最常见的子串

2 个答案: