查找N个唯一字符的最长子字符串

时间:2014-01-14 16:54:48

标签: algorithm

输入:str =“abcdeefuiuiwiwwaaaa”n = 3 输出:“iwiwwaaaa”(最长的子行为3个差异字符)

我有一个解决方案如下。我的问题:

  1. 时间复杂度如何? 我知道它必须比O(n ^ 2)更好,但不确定是否可以得出它的O(n)。
  2. 下面的解决方案无法覆盖整个ASCII,我们可以在没有额外空间的情况下改进吗?

    public static String getSubstrOfMChars(String str, int m) 
    {
         if (str==null || str.length()==0)
             return "";     
    
         int len = str.length();        
         String max = "";
    
         for(int i=0; i<len;) 
         {  
             StringBuilder sb = new StringBuilder();
             int counter = 1;
             int checker = 0;
             char firstChar = str.charAt(i);
             int firstCharPos = i;    // first char position in the string
             sb.append(firstChar);
             checker |= 1 << (firstChar - 'a');
    
             for(int j=i+1; j<len; j++) 
             {  
                 char currChar = str.charAt(j);
                 if (currChar == firstChar) 
                     firstCharPos++;                
    
                 int tester = checker & (1<<(currChar - 'a'));
                 if ( tester > 0 ) // already have such character
                 {
                     sb.append(currChar);
                     continue;
                 }
    
                 // new character
                 if (++counter > m) 
                 {
                    i = firstCharPos + 1;
    
                    if (sb.length() > max.length()) 
                    {
                        max = sb.toString();
                    }
                    break;
                 }
                 sb.append(currChar);                   
                 checker |= 1 << (currChar - 'a');              
            }
    
            if (counter <= m) 
            {               
                if ((counter==m) && sb.length() > max.length()) 
                {
                    max = sb.toString();
                }               
                break;
            }
    
         }
    
         return max;        
    }
    

6 个答案:

答案 0 :(得分:10)

有一个O(n)。让S成为字符串。 只需使用两个指针ij浏览数组,并跟踪KS[i]之间不同字母的数字S[j]。当此数字小于或等于j时递增n,并在i大于K时递增n。还要记住K等于n的最长子字符串。

在实现中,您还需要一个哈希表来跟踪字母的最后一次出现。

Python实现:

def longest(S,n):
    i = j = K = 0
    res = (0,0)
    last = {}

    while i < len(S):
        # if current substring is better than others than save
        if K == n and j - i > res[1] - res[0]:
            res = (i,j)

        # if we reach the end of the string, we're done.
        if j + 1 > len(S):
            break
        # if we can go further with the right end than do it
        elif K <= n and j + 1 <= len(S):
            if not last.has_key(S[j]):
                K = K + 1
            last[S[j]] = j
            j = j + 1
        # if we must go further with the left end than do it
        else:
            if last.has_key(S[i]):
                del last[S[i]]
                K = K - 1
            i = i + 1
    return S[res[0]:res[1]]

答案 1 :(得分:2)

您目前的代码复杂度为O(N ^ 2),因为您使用嵌套for循环来检查从每个字符开始的子字符串。

IMO你可以在O(N * k)时间和O(k)额外空间(其中k =允许的唯一字符数)中执行此操作:

  1. 从头开始迭代字符串,并将值的映射中的第一个字符添加到找到的最后位置。
  2. 继续解析字符串并更新地图中每个字符的最后位置。
  3. 当你得到一个新角色时,增加字符数,并为这个角色找到最后一个位置=当前位置。
  4. 当地图中的计数达到k时,迭代map并搜索最小位置索引的值。计算present position - min(last position index)并相应地更新最大长度子字符串。递减计数。从地图中弹出这个角色。
  5. 继续上述操作直至到达琴弦的末尾。

答案 2 :(得分:1)

所有答案都太复杂了。我会提出一个简单的解决方案..

问题的解决围绕着DISTINCT CHARACTERS。

- 所以,我们的解决方案应该随时优先考虑UNIQUE字符数(unicount)。

- 有两种情况需要考虑。一个是unicount&lt; K或unicount&gt; = K。

CASE 1: (unicount<K)
    1a: Str[i] is a new character not present already in the current window.
         --Increase unicount and hash[str[i]]
    1b: Str[i] is a not  new character present already in the current window.
        --No need to  increase unicount. Just hash str[i].

CASE 2: (unicount>=K)
    2a.  Str[i] is a not  new character present already in the current window.
        --No need to do anything cause unicount will be equal to K. Just hash str[i].
    2b. Slide the window (VARIABLE start) till the unicount value decreases..
         --Now similar to case 1.

下面的代码打印出这种子串的最长长度,只有K个不同的字符。很容易修改它来实际打印这样的子串。

int printLengthKUniqueSubstring(string str,int k)
{
    int hash[256] = {0};

    int n = str.length();
    int unicount = 0,maxlength = 0,start = 0;
    for(int i=0;i<n;i++)
    {
        if(unicount<k)
        {
            if(hash[str[i]]==0)
            {
                hash[str[i]]++;
                unicount++;
            }
            else
                hash[str[i]]++;
        }
        else
        {
           // cout<<"hello "<<" "<<unicount<<" "<<i<<endl;
            if(hash[str[i]]>0)
                hash[str[i]]++;
            else
            {
                while(unicount>=k)
                {
                    hash[str[start]]--;
                    if(hash[str[start]]==0)
                        unicount--;
                    start++;
                }
                if(hash[str[i]]==0)
                {
                    hash[str[i]]++;
                    unicount++;
                }
                else
                    hash[str[i]]++;
            }

        }
        maxlength = max(maxlength,i-start+1);
    }
    if(unicount<k)
        return -1;
    return maxlength;
}

度过美好的一天!

答案 3 :(得分:1)

复杂性O(n*C)其中C是用于检查字典的最小值的键的常量。

以下是C#中的解决方案。

public static string GetLongestSubString(string s, int numberOfUniqueChar)
{
    char c;
    int start = 0;
    string result = string.Empty, temp = string.Empty;
    Dictionary<Char, int> dic = new Dictionary<char, int>();

    for (int i = 0; i < s.Length; i++)
    {
        if (!dic.ContainsKey(s[i]))
        {
            dic.Add(s[i], i);
            if (dic.Count > numberOfUniqueChar)
            {
                temp = s.Substring(start, (i - start));
                if (temp.Length > result.Length)
                {
                    result = temp;
                }
                c = dic.OrderBy(k => k.Value).FirstOrDefault().Key;
                start = dic[c]+1;
                dic.Remove(c);
            }
        }
        else
        {
            // increase index of the current key
            dic[s[i]] = i;

            //if last char not change then check current substring with the result
            if(i==s.Length-1){
                temp = s.Substring(start);
                if (temp.Length > result.Length)
                {
                    result = temp;
                }
            }
        }
    }

    return result;
}

答案 4 :(得分:0)

以下是我解决这个问题的方法。首先,它将字符串拆分为相同字符的组;然后循环以检索所有有效的子串;最后返回所有可能最长的子串:

import re
def longest(S,n):
    # 1. groupby unique characters
    grp_S =  [ s[0] for s in re.findall(r'(([a-z])\2*)', S)]
    # 2. retrieve all valid combinations in tuples (characters count, substring)
    options = []
    for i in xrange(len(grp_S)):
        g = 0
        while i  + n  + g <= len(grp_S):
            if  (len(set( [x[0] for x in grp_S [i: i + n + g]])) == n and i  + n  + g  + 1 > len(grp_S)) or \
                (len(set( [x[0] for x in grp_S [i: i + n + g]])) == n and len(set( [x[0] for x in grp_S [i: i + n + g + 1]])) > n):
                options.append( (len(''.join(grp_S [i: i + n + g])), ''.join(grp_S [i: i + n + g])) )
                break
            else: g = g + 1
    # 3. return the list of all longest substrings
    return [ v[1] for v in options if v[0] == max(options)[0] ]

答案 5 :(得分:-1)

enter image description here 简单,没有错误检查短python语法