Question

我正在处理这个着名的面试问题，即在array中删除重复元素而不使用auxillary storage并保留顺序;

我看了很多帖子; Algorithm: efficient way to remove duplicate integers from an array，Removing Duplicates from an Array using C。

它们要么在C中实现（没有解释），要么在Java Code等连续重复项时提供的[1,1,1,3,3]失败。

我对使用C不太自信，我的背景是Java。所以我自己实现了代码; 它是这样的：

使用两个循环，外部循环遍历数组，内部循环检查重复项，如果存在则将其替换为null。
然后我查看duplicate-replacement-null数组并删除null元素并将其替换为下一个非null元素。

我现在看到的总运行时间是O(n^2)+O(n) ~ O(n^2)。阅读上面的帖子，我知道如果不允许排序和辅助存储，这是我们能做的最好的。我的代码在这里：我正在寻找进一步优化的方法（如果有可能）或better/simplisitc logic;

public class RemoveDup {
    public static void main (String[] args){
        Integer[]  arr2={3,45,1,2,3,3,3,3,2,1,45,2,10};
            Integer[] res= removeDup(arr2);
                System.out.println(Arrays.toString(res));
            }
          private static Integer[] removeDup(Integer[] data) {
            int size = data.length;
            int count = 1;
                for (int i = 0; i < size; i++) {
                    Integer temp = data[i];
                    for (int j = i + 1; j < size && temp != null; j++) {
                        if (data[j] == temp) {
                            data[j] = null;
                        }
                    }
                }
                for (int i = 1; i < size; i++) {
                    Integer current = data[i];
                    if (data[i] != null) {
                        data[count++] = current;
                    }
                }

                return Arrays.copyOf(data, count);

         }

}

编辑1;来自@keshlam的重新格式化代码会抛出ArrayIndexOutofBound异常：

private static int removeDupes(int[] array) {
        System.out.println("method called");
        if(array.length < 2)
          return array.length;

        int outsize=1; // first is always kept

     for (int consider = 1; consider < array.length; ++consider) {

          for(int compare=0;compare<outsize;++compare) {
            if(array[consider]!=array[compare])
                array[outsize++]=array[consider]; // already present; advance to next compare
           else break;
          // if we get here, we know it's new so append it to output
          //array[outsize++]=array[consider]; // could test first, not worth it. 

        }

      }
        System.out.println(Arrays.toString(array));
         // length is last written position plus 1
        return outsize;
    }

Answer 1

好的，这是我的答案，应该是O（N * N）最坏的情况。（使用较小的常数，因为即使是最坏的情况，我也会测试N - 平均为1/2 N，但这是计算机科学而不是软件工程，仅仅2倍的加速并不重要。感谢{{3}指出那个。）

1）分割光标（输入和输出分别前进），

2）每个新值只需要与已经保存的值进行比较，如果找到匹配项，则可以停止比较。（提示关键字是“增量”）

3）不需要测试第一个元素。

4）我正在利用标记continue，我可以在break之前设置一个标志，然后测试标志。出来做同样的事情;这有点优雅。

4.5）如果确实如此，我本可以测试是否超大==考虑而不是复制。但是测试它可能需要与可能不必要的副本一样多的周期，并且大多数情况是它们不是相同的，因此更容易让可能的冗余副本发生

5）我没有重新复制关键功能中的数据;我已经将copy-for-printing操作分解为一个单独的函数，以明确removeDupes完全在目标数组中运行，加上堆栈上的一些自动变量。而且我不会花时间将阵列末尾的剩余元素归零;这可能是浪费的工作（如本例所示）。虽然我认为它实际上不会改变正式的复杂性。

import java.util.Arrays;

public class RemoveDupes {

  private static int removeDupes(final int[] array) {
    if(array.length < 2)
      return array.length;

    int outsize=1; // first is always kept

    outerloop: for (int consider = 1; consider < array.length; ++consider) {

      for(int compare=0;compare<outsize;++compare)
        if(array[consider]==array[compare])
          continue outerloop; // already present; advance to next compare

      // if we get here, we know it's new so append it to output
      array[outsize++]=array[consider]; // could test first, not worth it. 
    }

    return outsize; // length is last written position plus 1
  }

  private static void printRemoveDupes(int[] array) {
    int newlength=removeDupes(array);
    System.out.println(Arrays.toString(Arrays.copyOfRange(array, 0, newlength)));
  }

  public static void main(final String[] args) {
    printRemoveDupes(new int[] { 3, 45, 1, 2, 3, 3, 3, 3, 2, 1, 45, 2, 10 });
    printRemoveDupes(new int[] { 2, 2, 3, 3 });
    printRemoveDupes(new int[] { 1, 1, 1, 1, 1, 1, 1, 1 });
  }
}

延迟添加：由于人们在我的解释中对第4点表示混淆，这里的循环被重写而没有标记为continue：

for (int consider = 1; consider < array.length; ++consider) {
  boolean matchfound=false;

  for(int compare=0;compare<outsize;++compare) {
    if(array[consider]==array[compare]) {
      matchfound=true;
      break;
    }

    if(!matchFound) // only add it to the output if not found
      array[outsize++]=array[consider];
}

希望有所帮助。标签continue是Java的一个很少使用的功能，所以有些人以前没见过它并不太令人惊讶。它很有用，但它确实使代码更难阅读;我可能不会在比这个简单算法复杂得多的任何事情中使用它。

Answer 2

这里有一个不使用额外内存的版本（它返回的数组除外）并且没有排序。

我认为这比O（n * log n）略差。

编辑：我错了。这略好于O（n ^ 3）。

public class Dupes {

    private static int[] removeDupes(final int[] array) {
        int end = array.length - 1;
        for (int i = 0; i <= end; i++) {
            for (int j = i + 1; j <= end; j++) {
                if (array[i] == array[j]) {
                    for (int k = j; k < end; k++) {
                        array[k] = array[k + 1];
                    }
                    end--;
                    j--;
                }
            }
        }

        return Arrays.copyOf(array, end + 1);
    }

    public static void main(final String[] args) {
        System.out.println(Arrays.toString(removeDupes(new int[] { 3, 45, 1, 2, 3, 3, 3, 3, 2, 1, 45, 2, 10 })));
        System.out.println(Arrays.toString(removeDupes(new int[] { 2, 2, 3, 3 })));
        System.out.println(Arrays.toString(removeDupes(new int[] { 1, 1, 1, 1, 1, 1, 1, 1 })));
    }
}

这是一个修改后的版本，它不会在欺骗之后移动所有元素。相反，它只是用最后一个不匹配的元素切换欺骗。这显然不能保证秩序。

private static int[] removeDupes(final int[] array) {
    int end = array.length - 1;
    for (int i = 0; i <= end; i++) {
        for (int j = i + 1; j <= end; j++) {
            if (array[i] == array[j]) {
                while (end >= j && array[j] == array[end]) {
                    end--;
                }
                if (end > j) {
                    array[j] = array[end];
                    end--;
                }
            }
        }
    }

    return Arrays.copyOf(array, end + 1);
}

Answer 3

这里有一个O(n^2)的最坏情况，其中返回指向第一个非唯一元素。所以在它之前的一切都是独特的可以使用Java中的C ++迭代器索引代替。

std::vecotr<int>::iterator unique(std::vector<int>& aVector){
    auto end = aVector.end();
    auto start = aVector.begin();
    while(start != end){
        auto num = *start; // the element to check against
        auto temp = ++start; // start get incremented here
        while (temp != end){
            if (*temp == num){
                std::swap(temp,end);
                end--;
            }
            else
                temp++; // the temp is in else so that if the swap occurs the algo should still check the swapped element.
        }
    }
return end;
}

Java等效代码:(返回将是一个int，它是第一个非唯一元素的索引）

int unique(int[] anArray){
        int end = anArray.length-1;
        int start = 0;
        while(start != end){
            int num = anArry[start]; // the element to check against
            int temp = ++start; // start get incremented here
            while (temp != end){
                if (anArry[temp] == num){
                    swap(temp,end); // swaps the values at index of temp and end
                    end--;
                }
                else
                    temp++; // the temp is in else so that if the swap occurs the algo should still check the swapped element.
            }
        }
    return end;
    }

这个算法与你的细微差别在于你的观点2.在那里你不是用null替换当前元素而是用最后一个可能唯一的元素交换它，在第一个交换中它是数组的最后一个元素，在第二个交换第二个，依此类推。

您还可以考虑在C ++中查看std::unique实现，其中线性比第一个和最后一个之间的距离小：比较每对元素，并可能对其中一些元素执行赋值。，但正如@keshlam所指出的那样，它仅用于排序数组。返回值与我的算法相同。以下是直接来自标准库的代码：

template<class _FwdIt, class _Pr> inline
    _FwdIt _Unique(_FwdIt _First, _FwdIt _Last, _Pr _Pred)
    {   // remove each satisfying _Pred with previous
    if (_First != _Last)
        for (_FwdIt _Firstb; (_Firstb = _First), ++_First != _Last; )
            if (_Pred(*_Firstb, *_First))
                {   // copy down
                for (; ++_First != _Last; )
                    if (!_Pred(*_Firstb, *_First))
                        *++_Firstb = _Move(*_First);
                return (++_Firstb);
                }
    return (_Last);
    }

Answer 4

为了引入一点透视 - 在Haskell中的一个解决方案，它使用列表而不是数组并返回相反的顺序，可以通过在末尾应用反向来修复。

import Data.List (foldl')

removeDup :: (Eq a) => [a] -> [a]
removeDup = foldl' (\acc x-> if x `elem` acc then acc else x:acc) []

算法删除数组中没有辅助存储的重复元素

4 个答案: