Question

虽然我已阅读了有趣的帖子Algorithm: efficient way to remove duplicate integers from an array，但我未能找到令人满意的答案：

我有一个双打的一维数组，通常相当小（最多只包含三个元素） - 尽管为了普遍性，这不一定是一个标准。

此外，我不想只找到真正的副本，但在某种意义上重复这些元素＆＃39;差异低于某个阈值。虽然这个要求相当容易处理，但我的实际问题是：如何在尽可能少的开销下实现ANSI C中的一般重复删除？

备注：我无法从提到的帖子中找到解决方案的主要原因有三个：

许多给定的解决方案使用纯C以外的语言，所以这没有任何特别的帮助。
如果数组中的所有元素都相同，某些解决方案不会起作用，在我的情况下可能就是这种情况。
某些描述的算法似乎仅适用于整数值。作为一个C菜鸟，因此非常感谢任何建议。

附录：在某种伪代码中，我想要实现的目标如下：

1) Sort the array in ascending manner
2) Loop through array until element before the last one
   - Check if difference of element i to element i+1 is smaller than threshold
     -> If yes, store mean value as first element of new array
     -> If no, store original value of element i as first element of new array
3) Start the same again in order to check if now the differences between the new array elements lie below the threshold
   -> Abort if no difference is smaller than the threshold anymore

因此，我的主要问题是：如何实现步骤3，使得任意次数的迭代都是可能的，并且只要存在与＃34太近的数组元素，该函数就会运行。（关于我的门槛）。

Answer 1

此问题是element distinctness problem的变体。

因为您不仅仅在寻找完全重复的内容。 - 但是对于“关闭副本”，解决方案不能包含散列。

解决方案基本上是对数组进行排序，然后对其进行迭代并跳过＆＃39;跳过＆＃39;愚蠢的元素。

这个解是O（nlogn），是最优的，因为它是任意元素清晰度的最优解。

类似C的伪代码：

#define epsilon SOME_SMALL_TOLERANCE_VALUE
int trimDupes(double[] arr,int n) { 
   sort(arr);
   int i = 0;
   int currPos = 0;
   double last = -Infinity; //min double, negative infinity
   for (i = 0; i < n; i++) { 
      if (abs(last-arr[i]) > epsilon) {
          arr[currPos++] = arr[i];
          last = arr[i]; //getting this out of the condition gets a bit different behavior, think what you need.
       }
    }
    return curr; //new length of the array - after it everything is garbage.
}

此解决方案使用非常少的额外空间[基本上排序算法需要的空间+一些常量]，以及O(nlogn)时间进行排序+额外的单次迭代。

Answer 2

对数组进行排序。然后遍历数组，复制到另一个数组。如果与当前项目相比较的下一个项目在阈值内，则具有内部循环以将当前项目与所有剩余项目进行比较，跳过阈值内的所有项目。当你到达超出阈值的项目时，你有下一个当前项目。

通过确定开始比较的起始元素是按特定顺序，然后您回避问题的评论中列出的问题。但请注意，如果更改顺序（排序升序与排序降序），结果会有所不同。

Answer 3

我现在找到了一个对我有用的解决方案，虽然需要几个函数调用，但复杂性可能不是最佳的：

#include <math.h>
#include <stdlib.h>

int compareDouble (const void * a, const void * b)
{
  if ( *(double*)a <  *(double*)b ) return -1;
  if ( *(double*)a == *(double*)b ) return 0;
  if ( *(double*)a >  *(double*)b ) return 1;
}

int main(void)
{
  double x[6] = {1.0,4.0,17.0,4.0,17.0,17.0};
  size_t n = sizeof(x)/sizeof(x[0]);
  const double thresh = 5.0;

  qsort(x, n, sizeof(double), compareDouble);

  size_t i = 0;
  size_t j = 0;

  while(i<=n-1)
  {
    if(i==n-1)
    {
      x[j++] = x[i];
      break;
    }
    else if(fabs(x[i]-x[i+1])>thresh)
    {
      x[j++] = x[i++];
    }
    else
    {
      x[j++] = (x[i]+x[i+1])/2;
      i+=2;
    }
  } 

  for(i=0; i<j; i++)
  {
    printf("result[i] = %.2f\n",i,x[i]);
  }
}
return 0;

赞赏任何其他评论或评论！

如何有效地消除双数组中的重复元素（在C中）？

3 个答案: