Question

我在下面编写了用于检测字符串中第一个重复字符的代码。

public static int detectDuplicate(String source) {
    boolean found = false;
    int index = -1;
    final long start = System.currentTimeMillis();
    final int length = source.length();
    for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) {
        boolean shiftPointer = false;
        for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) {
            if ( source.charAt(outerIndex) == source.charAt(innerIndex)) {
                found = true;
                index = outerIndex;
            } else {
                shiftPointer = true;
            }
        }
    }
    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

我需要两方面的帮助：

此算法的最坏情况复杂度是多少？ - 我的理解是O（n）。
这是最好的方法吗？有人可以提供更好的解决方案（如果有的话）吗？

谢谢， NN

Answer 1

正如其他人所说，你的算法是O（n ^ 2）。这是一个O（N）算法，因为HashSet #add在恒定时间内运行（散列函数在桶中正确地分散元素） - 请注意，我最初将散列集的大小调整为最大大小以避免调整大小/重新散列：

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        if (!uniqueChars.add(chars[i])) return i;
    }
    return -1;
}

注意：这将返回第一个副本的索引（即与前一个字符重复的第一个字符的索引）。要返回该字符首次出现的索引，您需要将索引存储在Map<Character, Integer>中（在这种情况下Map#put也是O（1））：

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        Integer previousIndex = uniqueChars.put(chars[i], i);
        if (previousIndex != null) {
            return previousIndex;
        }
    }
    return -1;
}

Answer 2

这是O（n ** 2），而不是O（n）。考虑案例abcdefghijklmnopqrstuvwxyzz。在程序终止之前，outerIndex的范围为0到25，每次递增时，innerIndex的范围都会从outerIndex到26。

要获得O（n），您需要在列表上进行一次传递，并在每个位置执行O（1）工作。由于要在每个位置执行的工作是检查之前是否已经看过该字符（如果是，所以在哪里），这意味着您需要O（1）映射实现。散列表为您提供;数组也是如此，由字符代码索引。

assylias shows how to do it with hashing，所以这里是如何用数组做的（只是为了笑，真的）：

public static int detectDuplicate(String source) {
    int[] firstOccurrence = new int[1 << Character.SIZE];
    Arrays.fill(firstOccurrence, -1);
    for (int i = 0; i < source.length(); i++) {
        char ch = source.charAt(i);
        if (firstOccurrence[ch] != -1) return firstOccurrence[ch];
        else firstOccurrence[ch] = i;
    }
    return -1;
}

Answer 3

复杂性大致为O(M^2)，其中M是字符串长度与可能字符集K之间的最小值。

只需记住您第一次遇到每个独特角色的位置，即可通过O(M)内存将其降至O(K)。

Answer 4

好的，我发现以下逻辑将O(N^2)缩减为O(N)。

public static int detectDuplicate(String source) {
    int index = -1;
    boolean found = false;
    final long start = System.currentTimeMillis();

    for(int i = 1; i <= source.length() && !found; i++) {
        if(source.charAt(i) == source.charAt(i-1)) {
            index = (i - 1);
            found = true;
        }
    }

    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

这也显示了我之前的算法的性能提升，该算法有2个嵌套循环。这需要130ms.来检测最后出现重复字符的63million个字符中的第一个重复字符。

我不相信这是否是最佳解决方案。如果有人找到更好的，请分享。

谢谢，

NN

Answer 5

我可以大大改善你的算法。它应该这样做：

StringBuffer source ...
char charLast = source.charAt( source.len()-1 );
int xLastChar = source.len()-1;
source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) );
int i = 1;
while( true ){
    if( source.charAt(i) == source.charAt(i-1) ) break;
    i += 1;
}
source.setCharAt( xLastChar, charLast );
if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1;
return i;

对于大字符串，此算法的速度可能是您的两倍。

Answer 6

您可以尝试：

 public static char firstRecurringChar(String s)
    {
    char x=' ';
    System.out.println("STRING : "+s);
    for(int i =0;i<s.length();i++)
    {
        System.out.println("CHAR AT "+i+" = " +s.charAt(i));
        System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i)));
        if(s.lastIndexOf(s.charAt(i)) >i){
            x=s.charAt(i);
            break;
        }
    }
    return x;
    }

Answer 7

O(1)算法

由于两个嵌套循环，您的解决方案是O（n ^ 2）。

执行此操作的最快算法是O(1)（常量时间）：

public static int detectDuplicate(String source) {
    boolean[] foundChars = new boolean[Character.MAX_VALUE+1];
    for(int i = 0; i < source.length(); i++) {
        if(i >= Character.MAX_VALUE) return Character.MAX_VALUE;
        char currentChar = source.charAt(i);
        if(foundChars[currentChar]) return i;
        foundChars[currentChar] = true;
    }
    return -1;
}

然而，就大哦而言，这只是快速的。

Java - 在字符串中查找第一个重复字符的最佳方法是什么

7 个答案: