在字符串O(n)解决方案中查找所有字谜

时间:2017-01-19 22:22:20

标签: algorithm data-structures

问题在于:

给定一个字符串s和一个非空字符串p,找到ps个字谜的所有起始索引。

Input: s: "cbaebabacd" p: "abc"
Output: [0, 6]
Input: s: "abab" p: "ab"
Output: [0, 1, 2]

这是我的解决方案

vector<int> findAnagrams(string s, string p) {
    vector<int> res, s_map(26,0), p_map(26,0);
    int s_len = s.size();
    int p_len = p.size();
    if (s_len < p_len) return res;
    for (int i = 0; i < p_len; i++) {
        ++s_map[s[i] - 'a'];
        ++p_map[p[i] - 'a'];
    }
    if (s_map == p_map)
        res.push_back(0);
    for (int i = p_len; i < s_len; i++) {
        ++s_map[s[i] - 'a'];
        --s_map[s[i - p_len] - 'a'];
        if (s_map == p_map)
            res.push_back(i - p_len + 1);
    }
    return res;
}

但是,我认为它是O(n ^ 2)解决方案,因为我必须比较矢量s_mapp_map。 这个问题是否存在O(n)解决方案?

3 个答案:

答案 0 :(得分:1)

您的解决方案 O(n)解决方案。 s_mapp_map向量的大小是常量(26),不依赖于n。因此,无论s_map有多大,p_mapn之间的比较都需要一段时间。

您的解决方案需要完成26 * n整数比较,即 O(n)

答案 1 :(得分:1)

我们可以说p的尺寸为n

假设您有一个大小为26的数组A,其中包含a,b,c,...包含的数字。

然后你创建一个大小为26的新数组B,填充0。

让我们调用给定的(大)字符串s

首先,您在B的第一个n字符中使用a,b,c,...的数字初始化s

然后您遍历ns的每个单词,始终更新B以适合此n大小的单词。

总是B匹配A您将拥有一个我们有一个字谜的索引。

B从一个n大小的单词更改为另一个单词,请注意您只需要在B中删除上一个单词的第一个字符并添加下一个字符的新字符字。

看一下例子:

Input
s: "cbaebabacd" 
p: "abc"          n = 3 (size of p)

A = {1, 1, 1, 0, 0, 0, ... }  // p contains just 1a, 1b and 1c.

B = {1, 1, 1, 0, 0, 0, ... }  // initially, the first n-sized word contains this.

compare(A,B)

for i = n; i < size of s; i++ {
    B[ s[i-n] ]--;
    B[ s[ i ] ]++;
    compare(A,B)
}

并假设compare(A,B)打印索引始终A匹配B.

总的复杂性将是:

first fill of A  = O(size of p)
first fill of B  = O(size of s)
first comparison = O(26)
for-loop = |s| * (2 + O(26)) = |s| * O(28) = O(28|s|) = O(size of s)
____________________________________________________________________
2 * O(size of s) + O(size of p) + O(26)

,其大小为s。

答案 2 :(得分:1)

// In papers on string searching algorithms, the alphabet is often
// called Sigma, and it is often not considered a constant. Your
// algorthm works in (Sigma * n) time, where n is the length of the
// longer string. Below is an algorithm that works in O(n) time even
// when Sigma is too large to make an array of size Sigma, as long as
// values from Sigma are a constant number of "machine words".

// This solution works in O(n) time "with high probability", meaning
// that for all c > 2 the probability that the algorithm takes more
// than c*n time is 1-o(n^-c). This is a looser bound than O(n)
// worst-cast because it uses hash tables, which depend on randomness.
#include <functional>
#include <iostream>
#include <type_traits>
#include <vector>
#include <unordered_map>
#include <vector>

using namespace std;
// Finding a needle in a haystack. This works for any iterable type
// whose members can be stored as keys of an unordered_map.
template <typename T>
vector<size_t> AnagramLocations(const T& needle, const T& haystack) {
  // Think of a contiguous region of an ordered container as
  // representing a function f with the domain being the type of item
  // stored in the container and the codomain being the natural
  // numbers. We say that f(x) = n when there are n x's in the
  // contiguous region.
  //
  // Then two contiguous regions are anagrams when they have the same
  // function. We can track how close they are to being anagrams by
  // subtracting one function from the other, pointwise. When that
  // difference is uniformly 0, then the regions are anagrams.
  unordered_map<remove_const_t<remove_reference_t<decltype(*needle.begin())>>,
                intmax_t> difference;
  // As we iterate through the haystack, we track the lead (part
  // closest to the end) and lag (part closest to the beginning) of a
  // contiguous region in the haystack. When we move the region
  // forward by one, one part of the function f is increased by +1 and
  // one part is decreased by -1, so the same is true of difference.
  auto lag = haystack.begin(), lead = haystack.begin();
  // To compare difference to the uniformly-zero function in O(1)
  // time, we make sure it does not contain any points that map to
  // 0. The the property of being uniformly zero is the same as the
  // property of having an empty difference.
  const auto find = [&](const auto& x) {
    difference[x]++;
    if (0 == difference[x]) difference.erase(x);
  };
  const auto lose = [&](const auto& x) {
    difference[x]--;
    if (0 == difference[x]) difference.erase(x);
  };
  vector<size_t> result;
  // First we initialize the difference with the first needle.size()
  // items from both needle and haystack.
  for (const auto& x : needle) {
    lose(x);
    find(*lead);
    ++lead;
    if (lead == haystack.end()) return result;
  }
  size_t i = 0;
  if (difference.empty()) result.push_back(i++);
  // Now we iterate through the haystack with lead, lag, and i (the
  // position of lag) updating difference in O(1) time at each spot.
  for (; lead != haystack.end(); ++lead, ++lag, ++i) {
    find(*lead);
    lose(*lag);
    if (difference.empty()) result.push_back(i);
  }
  return result;
}
int main() {
  string needle, haystack;
  cin >> needle >> haystack;
  const auto result = AnagramLocations(needle, haystack);
  for (auto x : result) cout << x << ' ';
}