查找包含String的最短可能子字符串

时间:2016-03-25 01:35:55

标签: java string algorithm

这是最近一次编程采访中提出的一个问题。

给定一个随机字符串S和另一个具有唯一元素的字符串T,找到S的最小连续子字符串,使其包含T中的所有元素。 说,

S='adobecodebanc' 
T='abc' 
Answer='banc'

我想出了一个解决方案,

public static String completeSubstring(String T, String S){

        String minSub = T;
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i <T.length()-1; i++) {
            for (int j = i + 1; j <= T.length() ; j++) {
                String sub = T.substring(i,j);
                if(stringContains(sub, S)){
                    if(sub.length() < minSub.length()) minSub = sub;
                }
            }
        }
        return minSub;

    }
    private static boolean stringContains(String t, String s){
        //if(t.length() <= s.length()) return false;

        int[] arr = new int[256];

        for (int i = 0; i <t.length() ; i++) {
            char c = t.charAt(i);
            arr[c -'a'] = 1;
        }
        boolean found = true;
        for (int i = 0; i <s.length() ; i++) {
            char c = s.charAt(i);
            if(arr[c - 'a'] != 1){
                found = false;
                break;
            }else continue;
        }
        return found;
    }

该算法具有O(n3)复杂度,但自然不是很好。有人可以提出更好的算法。

3 个答案:

答案 0 :(得分:7)

这是O(N)解决方案。

需要注意的重要事项:复杂性是每个工作单元都需要递增startend,它们不会减少,算法会在它们结束之前停止。

public static String findSubString(String s, String t)
{
    //algorithm moves a sliding "current substring" through s
    //in this map, we keep track of the number of occurrences of
    //each target character there are in the current substring

    Map<Character,int[]> counts = new HashMap<>();
    for (char c : t.toCharArray())
    {
        counts.put(c,new int[1]);
    }

    //how many target characters are missing from the current substring
    //current substring is initially empty, so all of them
    int missing = counts.size();

    //don't waste my time
    if (missing<1)
    {
        return "";
    }

    //best substring found
    int bestStart = -1, bestEnd = -1;

    //current substring
    int start=0, end=0;
    while (end<s.length())
    {
        //expand the current substring at the end
        int[] cnt = counts.get(s.charAt(end++));
        if (cnt!=null)
        {
            if (cnt[0]==0)
            {
                --missing;
            }
            cnt[0]+=1;
        }
        //while the current substring is valid, remove characters
        //at the start to see if a shorter substring that ends at the
        //same place is also valid 
        while(start<end && missing<=0)
        {
            //current substring is valid
            if (end-start < bestEnd-bestStart || bestEnd<0)
            {
                bestStart = start;
                bestEnd = end;
            }
            cnt = counts.get(s.charAt(start++));
            if (cnt != null)
            {
                cnt[0]-=1;
                if (cnt[0]==0)
                {
                    ++missing;
                }
            }
        }
        //current substring is no longer valid.  we'll add characters
        //at the end until we get another valid one
        //note that we don't need to add back any start character that
        //we just removed, since we already tried the shortest valid string
        //that starts at start-1

    }
    return(bestStart<=bestEnd ? s.substring(bestStart,bestEnd) : null);
}

答案 1 :(得分:1)

我知道已经有足够的O(N)复杂性答案,但我试图在不抬头的情况下自行解决,只是因为它是一个有趣的问题需要解决并认为我会分享。这是我提出的O(N)解决方案:

public static String completeSubstring(String S, String T){
    int min = S.length()+1, index1 = -1, index2 = -1;
    ArrayList<ArrayList<Integer>> index = new ArrayList<ArrayList<Integer>>(); 
    HashSet<Character> targetChars = new HashSet<Character>();
    for(char c : T.toCharArray()) targetChars.add(c);

    //reduce initial sequence to only target chars and keep track of index
    //Note that the resultant string does not allow the same char to be consecutive

    StringBuilder filterS = new StringBuilder();
    for(int i = 0, s = 0 ; i < S.length() ; i++) {
        char c = S.charAt(i);
        if(targetChars.contains(c)) {
            if(s > 0 && filterS.charAt(s-1) == c) {
                index.get(s-1).add(i);
            } else {
                filterS.append(c);
                index.add(new ArrayList<Integer>());
                index.get(s).add(i);
                s++;
            }
        }
    }

    //Not necessary to use regex, loops are fine, but for readability sake
    String regex = "([abc])((?!\\1)[abc])((?!\\1)(?!\\2)[abc])";
    Matcher m = Pattern.compile(regex).matcher(filterS.toString());

    for(int i = 0, start = -1, p1, p2, tempMin, charSize = targetChars.size() ; m.find(i) ; i = start+1) {
        start = m.start();
        ArrayList<Integer> first = index.get(start);
        p1 = first.get(first.size()-1);
        p2 = index.get(start+charSize-1).get(0);
        tempMin = p2-p1;

        if(tempMin < min) {
            min = tempMin;
            index1 = p1;
            index2 = p2;
        }
    }

    return S.substring(index1, index2+1);   
}

我很确定复杂性是O(N),如果我错了,请纠正

答案 2 :(得分:1)

@MattTimmermans提出的O(N)算法的替代实现,它使用Map<Integer, Integer>来计算出现次数,Set<Integer>来存储当前子字符串中存在的来自T的字符:

public static String completeSubstring(String s, String t) {
    Map<Integer, Integer> occ 
        = t.chars().boxed().collect(Collectors.toMap(c -> c, c -> 0));

    Set<Integer> found = new HashSet<>();      // characters from T found in current match
    int start = 0;                             // current match
    int bestStart = Integer.MIN_VALUE, bestEnd = -1;

    for (int i = 0; i < s.length(); i++) {
        int ci = s.charAt(i);                  // current char
        if (!occ.containsKey(ci))              // not from T
            continue;
        occ.put(ci, occ.get(ci) + 1);          // add occurrence
        found.add(ci);
        for (int j = start; j < i; j++) {      // try to reduce current match
            int cj = s.charAt(j);
            Integer c = occ.get(cj);
            if (c != null) { 
                if (c == 1) {                  // cannot reduce anymore
                    start = j;
                    break;
                } else 
                    occ.put(cj, c - 1);        // remove occurrence
            }
        }
        if (found.size() == occ.size()         // all chars found
            && (i - start < bestEnd - bestStart)) {
            bestStart = start;
            bestEnd = i;
        }
    }
    return bestStart < 0 ? null : s.substring(bestStart, bestEnd + 1); 
}