Question

不幸的是，我不知道以下问题的名称，但我确信这是众所周知的问题。我想找到解决问题的有效算法。

令S - 输入字符串和K - 某个数字（1 <= K <= 26）。

问题是找到S的最长子串，它只有K个不同的字符。解决这个问题的最佳算法是什么？

一些例子：

1）S = aaaaabcdef，K = 3，答案= aaaaabc

2）S = acaaba，K = 2，answer = acaa或aaba

3）S = abcde，K = 5，answer = abcde

我有解决这个问题的草图。但对我来说似乎太难了，也有二次复杂性。因此，在单线性传递中，我可以计算相同字符的后续数量和适当的数量。下一步是使用仅包含K个字符的set。用法类似：

std::string max_string;
for (int i = 0; i < s.size(); ++i)
{
   std::set<int> my_set;
   std::string possible_solution;
   for (int j = i; j < s.size(); ++j)
   {
       // filling set and possible_solution
   }
   if (my_set.size() == K && possible_solution.size() > max_string.size())
      max_string = possible_solution; 
}

Answer 1

记号：
  s =输入字符串，从零开始的索引
  [start, end) =从开始到结束的输入子字符串，包括start但不包括end
  k-substring =包含最多 k个不同字符的子字符串

算法：线性复杂度O(n)

start = 0
result = empty string
find max(end): [start, end) is a k-substring
LOOP:
  // please note in every loop iteration, [start, end) is a k-substring
  update result=[start, end) if (end-start) > length(result)
  if end >= length(s) then DONE! EXIT
  increase start until [start, end) is a (k-1)-substring
  increase end while [start, end] is a k-substring
ENDLOOP

要检查增加的开始或结束是分别减少还是增加字符池大小（ k属性），我们可以使用count []数组，其中count [c] = c的出现次数在当前子串[开始，结束]。

C ++实施：http://ideone.com/i2JPCq

Answer 2

我能想出的最佳解决方案是时间复杂度O（log（n）* n））和额外的内存复杂度O（n）。这个想法如下：

首先对所有26个字符计算前缀和数组。对于字符C，此数组具有以下属性：₀ = 0，_i = <number of occurrences of C up to position i>。计算它非常容易：

a[0] = 0;
for (int i = 1; i <= n; ++i) {
  a[i] = a[i - 1] + (s[i - 1] == C)
}

现在让我们假设你有这些数组。在闭区间[i，j]中计算字符C的出现次数非常容易。这正是a[j + 1] - a[j]。使用此功能，您还可以检查C是否出现在区间[i，j]中的某个位置 - 只需检查出现次数是否大于0。

我的解决方案的最后一部分是使用二进制搜索。对于字符串中的每个索引i，使用二进制搜索来标识从位置i开始的具有不超过K个不同字符的子串的最长长度。这部分算法的复杂性是O（n * log（n））。

Answer 3

由于您的字母表只包含26个字母，因此线性时间算法可以如下：

从左到右扫描字符串，每一步都保持两个独立的数组startIndex [26]，endIndex [26]。

startIndex[i] = index of first instance of ('a' + i)th letter in the current active substring.  
endIndex[i] = index of last instance of ('a' + i)th letter in the current active substring.

您可以将数组元素初始化为任何奇怪的值（如-1），以在算法期间检查其有效性。另外，保持到目前为止获得的子串的最大长度和当前活动的唯一字符的数量。

算法：

1. i = 0. 
   - Mark the startIndex and endIndex of S[0]. 
   - Initialize maxLength = 1
   - Initialize activeChars = 1.
2. for i = 1 to S.size()-1
   - if (S[i] != any of the activeChars) // can be done in O(26)
          if (activeChars == K)
              update maxLength if maxLength < currLength.
              remove an active char with least startIndex.
              add this new char to startIndex and endIndex
              currLength = i - min (remaining active startIndex) + 1
          else
              activeChars++;
              add this S[i] to startIndex and endIndex
              currLength++.          
              update maxLength if maxLength < currLength.
    else
       update endIndex for S[i].       
       currLength++.          
       update maxLength if maxLength < currLength.
3. again update maxLength if maxLength < currLength.

Answer 4

我将尝试修改Abhishek Bansal的算法，以保持线性复杂性并修补活动组中重复字符可能出现的错误。

从左到右扫描字符串，每一步都维护两个独立的数组startIndex [26]，endIndex [26]，以及一个映射，在这个映射中，每个char（键）与活动子字符串（值）中的所有字符串相关联。

startIndex[i] = index of first instance of ('a' + i)th letter in the current active substring  
endIndex[i] = index of last instance of ('a' + i)th letter in the current active substring.
map.get(i) = list of occurencies in considered substring.

算法：

1. i = 0. 
   - Mark the startIndex and endIndex of S[0], add the occurency of S[0] to the map. 
   - Initialize maxLength = 1
   - Initialize activeChars = 1.
2. for i = 1 to S.size()-1
   - if (S[i] != any of the activeChars) // can be done in O(26)
          if (activeChars == K)
              update maxLength if maxLength < currLength.
              remove the active char with least endIndex.
              add this new char to startIndex and endIndex, and to the map with this occurency
              remove from the map all the occurencies of all the chars that are previous than removed char's endIndex
              update all the startIndex referring to the edited map
              currLength = i - min (remaining active startIndex) + 1
          else
              activeChars++;
              add this S[i] to startIndex and endIndex and to the map
              currLength++.          
              update maxLength if maxLength < currLength.
    else
       update endIndex for S[i], add the occurency to the map.       
       currLength++.          
       update maxLength if maxLength < currLength.
3. again update maxLength if maxLength < currLength.

为了清楚起见，我保留了startIndex和endIndex数组，但你可以避免额外的空间和额外的工作来使用存储在地图中的出现物列表的第一个和最后一个元素来更新它们== char C

找到带约束的最长子串的最佳算法是什么？

4 个答案: