就地基数排序后跟线性扫描

Question

给定n个整数元素的数组，如何在不使用任何额外空间的情况下在O（n）时间内找到数组中是否存在重复。

额外的空间意味着额外的空间O（n）。

Xor操作员是否以任何方式提供帮助。

Answer 1

如果没有其他信息，此问题似乎无法解决，因为这是 Element Distinctness Problem ，在您所需的时间内，您提供的限制无法解决。

你可以允许：

（1）更多内存并使用hashtable / hashset并符合O（n）时间标准。 [迭代数组，检查一个元素是否在哈希表中，如果你有dupes，否则 - 将元素插入表中并继续]。

（2）更多时间，对数组[O（nlogn）]进行排序并满足子线性空间标准。 [排序后，迭代数组，并为每个a[i] , a[i+1]检查它们是否相同。如果你没有找到相同的一对，你就没有傻瓜]

编辑：这个声明的证据有点冗长，需要这里不支持的数学符号（旁注：我们真的需要tex支持），但我们的想法是如果我们模拟我们的问题作为一个代数计算树（这是一个公平的假设，当没有允许散列，并且可以处置不变的空间），然后，Ben Or在他的文章Lower Bounds For Algebraic Computation Trees (1983)（在有声望的ACM中发表）中证明了，元素清晰度是{ {1}}此模型下的问题。 Lubiw表明，同样的结论也适用于1991年将自己限制为整数：A Lower Bound for the Integer Element Distinctness Problem，但是这些文章得出的结论是在代数树计算模型下 - 整数不同问题是Omega（nlogn）问题

Answer 2

就地基数排序后跟线性扫描

In place radix sort algorithm

根据你实际考虑的Radix排序的时间复杂度，这个解决方案是O（N）时间，虽然我的个人意见不是这样。我认为如果你不对整数排序做出线性时间假设，那么问题是无法解决的。

由于排序是就地的，因此只需要额外的O（1）存储空间。

代码全是C ++ 11

第1步：基数排序

template<typename T, typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
void RecurseOnRadixSort(std::vector<T>& myArray, T mask, int zerosEnd, int onesBegin)
{
    if (zerosEnd+1 >= onesBegin-1 || mask == 0) 
        return;

    int zerosEnd2 = zerosEnd;
    int onesBegin2 = onesBegin;
    while(zerosEnd2+1 <= onesBegin2-1)
    {
        // swap ones to the right
        if ((myArray[zerosEnd2+1] & mask) != 0)
        {
            std::swap(myArray[zerosEnd2+1], myArray[onesBegin2-1]);
            --onesBegin2;
        }
        else
            ++zerosEnd2;
    }

    mask >>= 1;

    //recurse on lhs
    RecurseOnRadixSort(myArray, mask, zerosEnd, zerosEnd2+1);

    //recurse on rhs
    RecurseOnRadixSort(myArray, mask, onesBegin2-1, onesBegin);
}

template <typename T, typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
void InPlaceRadixSort(std::vector<T>& myArray)
{
    int zerosEnd = -1;
    int onesBegin = static_cast<int>(myArray.size());
    T mask = static_cast<T>(1) << sizeof(T)*8-1;
    while(zerosEnd+1 <= onesBegin-1)
    {
        if ( (myArray[zerosEnd+1] & mask) != 0)
        {
            std::swap(myArray[zerosEnd+1], myArray[onesBegin-1]);
            --onesBegin;
        }
        else
            ++zerosEnd;
    }

    mask = static_cast<T>(1) << sizeof(T)*8-2; // need to reassign in case of signed datatype
    //recurse on lhs
    RecurseOnRadixSort(myArray, mask, -1, zerosEnd+1);
    //recurse on rhs
    RecurseOnRadixSort(myArray, mask, onesBegin-1, static_cast<int>(myArray.size()));

    // swap negatives to the front
    auto iterSmallest = std::min_element(myArray.begin(), myArray.end());
    if (*iterSmallest < 0)
    {
        std::reverse(myArray.begin(), myArray.end());
        iterSmallest = std::min_element(myArray.begin(), myArray.end());
        std::reverse(myArray.begin(), iterSmallest+1);
        std::reverse(iterSmallest+1, myArray.end());
    }
}

第2步：对重复元素进行线性扫描

for (size_t i=0, j=1; j<myArray.size(); ++i,++j)
{
    if (myArray[i] == myArray[j])
    {
        std::cout << "Found duplicate element " << myArray[i];
    }
}

完整代码

#include <iostream>
#include <string>
#include <vector>
#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>
#include <type_traits>
using namespace std;
#define N 10

template <typename T>
void PrintArray(const std::vector<T>& myArray)
{
    for (auto&& element : myArray)
    {
        std::cout << element << std::endl;
    }
}

template<typename T, typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
void RecurseOnRadixSort(std::vector<T>& myArray, T mask, int zerosEnd, int onesBegin)
{
    if (zerosEnd+1 >= onesBegin-1 || mask == 0) 
        return;

    int zerosEnd2 = zerosEnd;
    int onesBegin2 = onesBegin;
    while(zerosEnd2+1 <= onesBegin2-1)
    {
        // swap ones to the right
        if ((myArray[zerosEnd2+1] & mask) != 0)
        {
            std::swap(myArray[zerosEnd2+1], myArray[onesBegin2-1]);
            --onesBegin2;
        }
        else
            ++zerosEnd2;
    }

    mask >>= 1;

    //recurse on lhs
    RecurseOnRadixSort(myArray, mask, zerosEnd, zerosEnd2+1);

    //recurse on rhs
    RecurseOnRadixSort(myArray, mask, onesBegin2-1, onesBegin);
}

template <typename T, typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
void InPlaceRadixSort(std::vector<T>& myArray)
{
    int zerosEnd = -1;
    int onesBegin = static_cast<int>(myArray.size());
    T mask = static_cast<T>(1) << sizeof(T)*8-1;
    while(zerosEnd+1 <= onesBegin-1)
    {
        if ( (myArray[zerosEnd+1] & mask) != 0)
        {
            std::swap(myArray[zerosEnd+1], myArray[onesBegin-1]);
            --onesBegin;
        }
        else
            ++zerosEnd;
    }

    mask = static_cast<T>(1) << sizeof(T)*8-2; // need to reassign in case of signed datatype
    //recurse on lhs
    RecurseOnRadixSort(myArray, mask, -1, zerosEnd+1);
    //recurse on rhs
    RecurseOnRadixSort(myArray, mask, onesBegin-1, static_cast<int>(myArray.size()));

    // swap negatives to the front
    auto iterSmallest = std::min_element(myArray.begin(), myArray.end());
    if (*iterSmallest < 0)
    {
        std::reverse(myArray.begin(), myArray.end());
        iterSmallest = std::min_element(myArray.begin(), myArray.end());
        std::reverse(myArray.begin(), iterSmallest+1);
        std::reverse(iterSmallest+1, myArray.end());
    }
}

int main() {
    srand(time(NULL));
    std::vector<int> myArray(N);
    for (size_t i=0;i<myArray.size();++i)
    {
        myArray[i] = rand() % 100 * (rand() % 2 == 1?-1:1);
    }

    std::cout << "Vector before radix sort: " << std::endl;
    PrintArray(myArray);
    InPlaceRadixSort(myArray);
    std::cout << "Vector after radix sort: " << std::endl;
    PrintArray(myArray);

    for (size_t i=0, j=1; j<myArray.size(); ++i,++j)
    {
        if (myArray[i] == myArray[j])
        {
            std::cout << "Found duplicate element " << myArray[i];
        }
    }
    return 0;
}

Live Demo

Answer 3

这是一个有趣的solution这个问题，只有一个约束，元素的范围应该在0到n-2之间（包括在内），其中n是元素的数量。

这在O（n）时间内工作，具有O（1）空间复杂度。

Answer 4

这是O（n）时间使用和O（1）空间使用的解决方案！

Traverse the array. Do following for every index i of A[].
{
    check for sign of A[abs(A[i])] ;
    if positive then        make it negative by   A[abs(A[i])]=-A[abs(A[i])];
    else  // i.e., A[abs(A[i])] is negative
    this   element (ith element of list) is a repetition
}

致谢：方法5 Geek for Geeks

Answer 5

对于一般情况，由于强大的复杂性约束和无限制的输入，这个问题似乎没有解决方案。

很明显，您需要至少N步才能查看所有输入。所以它不能更快而不是O(n)。

现在，为了确保发现每个可能的重复，你有不同的可能性：

将每个号码与每个其他号码进行比较，这不需要太多额外空间，但需要O(n^2)时间。
通过在可用空间中交换整数，以更智能的方式进行比较。这允许在序列本身中“存储信息”。实际上，将所有数字相互比较通常是在排序算法中完成的。最快的已知排序算法不需要额外的空间需要O(n log n)时间。 Wikipedia has a rather lengthy writeup with lots of sources。因此，您永远无法以这种方式获得您的时间要求。（some comparison chart of known sorting algorithms）
您可以使用哈希映射进行一些簿记，这可能只允许您使用线性时间O(n)，但需要将记账保存在某处。否则，您只需“忘记”您已经看过的数字。不幸的是，如果您的输入增加，那么簿记将需要更多空间，因为您需要记住许多不同的数字。因此，不可能有相同的固定数量的内存并比较任意长的输入序列。因此，您必须违反常量空格O(1)。

正如@Atishay在他的回答中指出的那样，如果您的输入非常有限，可以成为解决方案。这里要求您拥有一个大小为n的数组，并且可能的值仅在[0,n-2]范围内。这个要求保证必须在某处重复，因为与数组中的元素的值不同。有了这些知识和非常具体的价值观，你就可以做到。但这使用了非常狭隘的假设，并没有解决问题中陈述的一般问题。

修改

正如评论中所阐明的那样，基于比较的排序算法的时间复杂性已被证明是下限。供参考，请参见此处：

Answer 6

此解决方案基于@dsimcha从数组中删除重复项的解决方案，可以找到here。

它执行就地交换算法，使用值哈希来交换位置。请注意，这会在一定程度上破坏原始阵列内容。但OP的问题并没有要求禁止这样做。

public static class DupFinder
{
    public static bool HasDups(int[] array, ref int nEvals)
    {
        nEvals = 0;
        return DupFinder.FindInPlace(array, 0, ref nEvals);
    }

    private static bool FindInPlace(int[] array, int start, ref int nEvals)
    {
        if (array.Length - start < 2)
            return false;

        var sentinel = array[start];
        var offset = start + 1;
        var len = array.Length - offset;
        for (var ndx = 0; ndx < len; nEvals++)
        {
            var cur = array[offset + ndx];
            if (cur == sentinel)
            {
                ndx++;
                continue;
            }

            var hash = cur % len;
            if (ndx == hash)
            {
                ndx++;
                continue;
            }

            var at_hash = array[offset + hash];
            if (cur == at_hash)
            {
                array[offset + ndx] = sentinel;
                ndx++;
                continue;
            }

            if (at_hash == sentinel)
            {
                Swap(array, offset, ndx, hash);
                ndx++;
                continue;
            }

            var hash_hash = at_hash % len;
            if (hash_hash != hash)
            {
                Swap(array, offset, ndx, hash);
                if (hash < ndx)
                    ndx++;
            }
            else
            {
                ndx++;
            }
        }

        var swapPos = 0;
        for (var i = 0; i < len; i++, nEvals++)
        {
            var cur = array[offset + i];
            if (cur != sentinel && i == (cur % len))
                Swap(array, offset, i, swapPos++);
        }

        for (var i = swapPos; i < len; nEvals++)
        {
            var cur = array[offset + i];
            if (cur == sentinel)
                return true; // got dups.
            else
                i++;
        }

        // Let's assume C# supports tail recursion ;-)
        // Then => look ma, O(1) extra storage space.
        return FindInPlace(array, offset + swapPos, ref nEvals);
    }

    private static void Swap(int[] array, int offset, int first, int second)
    {
        var tmp = array[offset + first];
        array[offset + first] = array[offset + second];
        array[offset + second] = tmp;
    }
}

因此，如果我们假设c＃支持尾递归并且我们不将使用的堆栈帧计为额外空间，则它具有O（1）空间要求。

提交人提到它具有O（N） - 时间复杂性。我执行的（有限的）测试（与计算复杂性分析相反）表明它更接近O（N log N）。

Array Size   Dup Position    #Evals
12           7               26
12           -               35
100,000      80,000          279,997
100,000      -               453,441

Answer 7

使用单个int作为临时变量的实现..这是使用位向量/

 public static boolean isUniqueChars(String str) {
    int checker = 0;
    for (int i = 0; i < str.length(); ++i) {
     int val = str.charAt(i) - ‘a’;
     if ((checker & (1 << val)) > 0) return false;
     checker |= (1 << val);
    }
    return true;
  }

或我在不使用任何临时变量

的情况下执行O（n ^ 2）

public static bool isDuplicate(char[] str) {
    if (str == null) return false;
    int len = str.length;
    if (len < 2) return false;

    for (int i = 1; i < len; ++i) {
      for (int j = 0; j < len; ++j) {
        if (str[i] == str[j]) return true;
      }
    }
    return false;
  }

Answer 8

Bloom filter是一个节省空间的哈希集，具有可调误报率。假阳性的可能性意味着当你从BF获得命中时，你必须返回并检查一个真正的重复，引入一个N ^ 2项 - 但系数是~exp（ - （用于过滤的额外空间））。这产生了一个有趣的空间与时间的权衡空间。

我没有证据证明所提出的问题是不可解决的，但总的来说“这里是一个有趣的权衡空间”对于一个不可解决的问题是一个很好的答案。

Answer 9

清除示例以按时间确定O（n）和按空格O（1）确定重复：

public class DuplicateDetermineAlgorithm {
    public static boolean isContainsDuplicate(int[] array) {
        if (array == null) {
            throw new IllegalArgumentException("Input array can not be null");
        }
        if (array.length < 2) {
            return false;
        }

        for (int i = 0; i < array.length; i++) {
            int pointer = convertToPositive(array[i]) - 1;
            if (array[pointer] > 0) {
                array[pointer] = changeSign(array[pointer]);
            } else {
                return true;
            }
        }
        return false;
    }

    private static int convertToPositive(int value) {
        return value < 0 ? changeSign(value) : value;
    }

    private static int changeSign(int value) {
        return -1 * value;
    }
}

Answer 10

public static void getDuplicatesElements (Integer arr[]){

    //Status array to track the elements if they are already considered
    boolean status[] = new boolean [arr.length];

    //Flag to mark the element found its duplicate
    boolean dupFlag = false;

    //Output string
    String  output = "";

    //Count of duplicate elements found
    int count = 0;

    //Initialize status array with all false i.e. no duplicates
    for (int i = 0; i < arr.length; i++)
    {
        status[i] = false;
    }

    //first loop to check every element
    for (int i = 0; i < arr.length - 1; i++)
    {
        //Initialize every element to no duplicate
        dupFlag = false;

        //Check if this element is not already found duplicate, if not, check now.
        if (!status[i]){
            for (int j = i+1; j <  arr.length; j++){
                if (arr[i] == arr[j]){
                    dupFlag = true;
                    status[j] = true;
                }
            }
        }

        if (dupFlag){
            output = output + " " + arr[i];
            count++;
        }
    }

    System.out.println("Duplicate elements: " + output );
    System.out.println("Count: " + count );

}

Answer 11

声明

我没有答案，但我的想法过于广泛而无法发表评论。另外，我想把它们写下来，所以我花三个小时考虑一个解决方案并没有完全浪费掉。我希望能给你一个不同的观点，但如果你不想浪费你的时间，就不要继续阅读。或者只是对这个答案进行投票，这是值得的：）

为了启动我们的视觉思维，让我们有一个示例数组：50 100 150 -2 -1 0 1 2 3 4。你可以肯定地告诉它，它没有重复，所以我们的算法应该输出FALSE。此外，它的长度为10。

步骤A：以O（N）时间计数

让我们暂时忽略额外的内存约束（实际上，通过假设我们可以有O(\inf)额外的内存来实际违反它，并保存在一个虚构的无限数组中（它也是双重的 - 无限的，因为它也允许负的indeces）每个整数的计数。对于我们的输入，这个数组看起来像这样：

...000001111111000...00100...00100...001000000...
        ^              ^               ^
   [index  -2]     [index  50]     [index 150]

如果数组中的任何元素大于1，那么我们有一个副本，算法应该返回TRUE。

步骤B：在O（N）时间内将-inf..inf映射到0..N

假设我们有一个映射f(x):-inf..inf -> 0..N，可以将无限数组压缩为大小为N的数组，并在O（N）时间内进行。理想情况下，这就是哈希。请注意，我们并不关心维护数组的顺序，因为我们只关心它是否具有高于1的元素。因此，我们可以结合这两个步骤，并消除对无限内存的需求 - 耶！我们仍然使用额外的O（N）内存（事实上，正好是N个计数）来保持计数值。下一步就是摆脱它。

步骤C：使用第一个元素作为开关

在我解释这一步之前，请注意我们并不需要存储任何大于1的计数。我们第一次想要增加一个计数器并且我们注意到它已经具有值1我们知道我们发现了重复！因此每个计数器1位内存就足够了。这会将所需的内存减少到O（lg（N）），但我们并不关心这一点，因为它不够好。重要的是每个计数器1位内存就足够了。

我们现在要利用我们可以修改输入数组的事实。我们遍历数组并使用第一个元素的值xor所有元素。如果结果小于操作前的值，我们将其更改为该结果。我们还将第一个元素单独存储为sw，额外的O（1）内存成本。

现在，我们可以使用存储的第一个元素sw和转换后的数组，按照以下方式对计数步骤（步骤A + B）中的计数进行编码：考虑索引为k的元素A的{{1}}，如果A[f(A[k])] < A[f(A[k])] xor sw则计数为zero，这意味着我们正在考虑的元素 - A[k] - 以前没有见过，所以我们改变{{1}到A[f(A[k])]。如果，A[f(A[k])] xor sw，则计数为A[f(A[k])] > A[f(A[k])] xor sw，这意味着我们正在考虑的元素 - one - 之前已经看过，所以它是重复的。

假设地图：

A[k]

并按以下顺序执行步骤后：f(-2 xr 50) -> 0 f(-1 xr 50) -> 1 f(0) -> 2 f(1) -> 3 f(2) -> 4 f(3) -> 5 f(4) -> 6 f(86) -> 7 f(150) -> 8 f(1337) -> 9输入数组如下所示：

step c; step a+b

尝试计算索引为50(0) 100(86) 150(164) -2(-2 xr 50) -1(-1 xr 50) 0(50) 1(51) 2(48) 3(49) 4(54) [intermediate state, not stored in memory] 0 86 150 -2 xr 50 -1 xr 50 0 1 2 3 4 [state after step c] 0 86 *164* -2 xr 50 -1 xr 50 0 1 2 3 4 [counted element 0] 0 86 164 -2 xr 50 -1 xr 50 0 1 *48* 3 4 [counted element 1] 0 86 164 -2 xr 50 -1 xr 50 0 1 48 *49* 4 [counted element 2] *50* 86 164 -2 xr 50 -1 xr 50 0 1 48 49 4 [counted element 3] 50 *100* 164 -2 xr 50 -1 xr 50 0 1 48 49 4 [counted element 4] 50 100 !164! -2 xr 50 -1 xr 50 0 1 48 49 4 [counted element 5] 5的元素，我们发现数组中已经存在0！（因为0 A[f(A[5])]大于164所以我们输出164 xr 50并且算法结束。

故事的道德

如果我们不被允许TRUE，我们一定会忘记并犯错误。

抱歉

不幸的是，我们没有完美的哈希函数，我们不能凭空创造内存，因此传统的方法在所需的约束下无法工作。给定完美的散列函数，memory x time指向的答案可能被修改以允许使用解释为数组indeces的数字的算法将落在数组的边界之外。但即使这样，也必须发明如何使用它来检测重复，而不是找到一个已经存在的...

无论如何，有趣的问题。

Answer 12

我想提出以下解决方案。希望对您有所帮助。

assert(A.size() > 1);

int m = A[0];
for(unsigned int i = 1; i < A.size(); ++i) {    // O(n)
    m ^= A[i];
    m ~= m;
}
return m;

它基于：https://codesays.com/2015/solution-to-odd-occurrences-in-array-by-codility/

欢呼

Answer 13

您可以使用交换排序来执行所有必要的交换，并最终找到重复的元素。如果要在同一数组中找到多个重复的元素，也可以扩展此解决方案。此解决方案的一个警告是，数组中数字的范围必须从1到n。

下面的

是此示例代码。输入-> [1,3,4,2,2]此处重复元素为2。您可以为上面的输入运行此代码。

    public int findDuplicate(int[] nums) {

    int i = 0;
    while(i < nums.length){
        if(nums[i] != nums[nums[i] - 1]){
            swap(i, nums[i] - 1, nums);
        }else{
            i++;   
        }
    }
    for(int j=0; j < nums.length; j++){
        if(nums[j] != j + 1){
            return nums[j];
        }
    }
    return -1;
}
    public static void swap(int i, int j, int[] nums){
        int temp = nums[i];
        nums[i] = nums[j];
        nums[j] = temp;
}

要记住的主要事情是索引i应该包含i +1个数字。例如：索引0应该具有元素1，即arr [0] = 1，索引1应该具有元素2，即arr [1] = 2，依此类推。

此解决方案不需要任何额外空间，时间复杂度为O（n）

Answer 14

import java.util.HashSet;
import java.util.Set;


public class FindDups {
public static void main(String[] args) {
    int a[]={1,2,3,3,4};

    Set<Integer> s=new HashSet<Integer>();
    for(int i=0;i<a.length;i++)
    {
    if(!s.add(a[i]))
        System.out.println("at index"+ i+" "+a[i]+"is duplicate");  
    }
    for(int i:s)
    {
        System.out.println(i);
    }
}
}

在数组中查找重复项

14 个答案:

就地基数排序后跟线性扫描

第1步：基数排序

第2步：对重复元素进行线性扫描

完整代码

Live Demo

修改

声明

步骤A：以O（N）时间计数

步骤B：在O（N）时间内将-inf..inf映射到0..N

步骤C：使用第一个元素作为开关

故事的道德

抱歉