Question

我必须使用以下签名在C中实现一个功能：int *unici(const int *vec, size_t size, size_t *newsize)，其中vec是const个int数组，{{ 1}}是数组的大小，size是没有重复的新数组的大小。

此函数必须在堆上创建一个新数组，并将*newsize中的值放入数组中而不重复。

示例：如果vec为vec，[2, 4, 5, 4, 5, 5, 7, 9]为size，则新数组应为8且[2, 4, 5, 7, 9]等于*newsize {1}}。

我试图实现它，但我不知道如何删除重复项并放入新数组。

编辑：那是我的解决方案，顺便说一句，我想通了，谢谢大家！

int cmpfunc(const void * a, const void * b) 
{
    return (*(int*)a - *(int*)b);
}

int *unici(const int *vec, size_t size, size_t *newsize)
{
    if (size == 0)
        return NULL;

    qsort(vec, size, sizeof(int), cmpfunc);

    size_t count = 0;

    for (size_t i = 0; i < size; i++)
    {
        //finding for duplicates 
        if (vec[i] == vec[i + 1])
            count++;
    }

    *newsize = size - count;
    int *tmp = malloc(*newsize * sizeof(int));

    //now I've to put in tmp values from vec without duplicates
}

Answer 1

首先，您不希望const int*中unici发出警告

passing argument 1 of ‘qsort’ discards ‘const’ qualifier from pointer target type

然后我们继续，为每个非重复的整数分配内存。

int *unici(int *v, size_t size, size_t *newsize)
{
    if (size == 0)
        return NULL;

    qsort(v, size, sizeof(int), cmpfunc);

    int *temp = malloc(sizeof *temp);
    if (temp == NULL){
        perror("Error ");
        exit(EXIT_FAILURE);
    }
    (*newsize)= 0;
    temp[(*newsize)++] = v[0];
    for (size_t i = 1; i < size; i++)
    {
        if (v[i] != v[i-1]){
            int *tt = realloc(temp,(*newsize+1)*sizeof *tt);
            if (tt == NULL){
                perror("Error ");
                exit(EXIT_FAILURE);
            }
            temp = tt;
            temp[(*newsize)++] = v[i];
        }
    }

    return temp;
}

此代码有两点

这里的数组应该是常量。因此，它必须复制数组，然后进行排序，然后消除重复项，并根据存在的唯一元素再次调整大小。如上所述，这样做的好处是const删除 - 您不必这样做。
在我的代码中，我为每个元素使用了reallocate - 这是一种过度杀伤力。那么该怎么做呢？如前所述，我们将其分配到最大大小，然后将大小减小到唯一列表。如果我们每次都减少它，这又让我们争论不休？（假设遗漏了1-2个位置。）那么就不需要调整大小，但是可以做到这一点。它在某种程度上是实施者的选择。

这种简单缩小的想法正在这里实施： -

int *unici(const int *vv, size_t size, size_t *newsize)
{
    if (size == 0)
        return NULL;

    int *v = malloc(sizeof *v * size);
    if (v == NULL){
        perror("Error in malloc");
        exit(EXIT_FAILURE);
    }
    memcpy(v, vv, size*sizeof*v);
    qsort(v, size, sizeof(int), cmpfunc);

    (*newsize)= 0;
    int last = v[0];
    for (size_t i = 1; i < size; i++)
    {
        if (v[i] != last){
            v[(*newsize)++] = last;
            last = v[i];
        }
    }
    v[(*newsize)++] = v[size-1];

    int *temp = realloc(v, (*newsize)*sizeof *v);
    if (temp == NULL){
        perror("Error in realloc");
        exit(EXIT_FAILURE);
    }
    v = temp;
    return v;
}

Answer 2

有两种基本方法。

复制原始数组。对新数组中的元素进行排序，并使用循环仅保留任何运行中的第一个（多个相同的值）：

int   *result; /* This is the duplicate array; sorted */
size_t i = 0;  /* Loop index */
size_t n = 0;  /* Unique elements in the duplicate array */

while (i < size) {
    const int  c = result[i++];

    /* Skip if there are more than one consecutive c */
    while (i < size && c == result[i])
        i++;

    /* Copy the unique elements back to the beginning
       of the array. */
    result[n++] = c;
}

如果需要，您可以将result重新分配到n * sizeof result[0]个字节。

将唯一元素n的数量存储到*newsize，然后返回result。

分配结果数组，但不要复制值。而不是排序（使重复值连续），使用双循环来检查每个值是否是唯一的（已经在结果数组中），并且只将唯一值复制到结果数组中：

int   *result; /* Allocated for 'size' elements */
size_t i, j;   /* Loop indexes */
size_t n = 0;  /* Unique elements in the duplicate array */

for (i = 0; i < size; i++) {

    /* Find first duplicate in result. */
    for (j = 0; j < n; j++)
        if (result[j] == vec[i])
            break;

    /* If no duplicates found, add to result. */
    if (j >= n)
        result[n++] = vec[i];
}

如果需要，您可以将result重新分配到n * sizeof result[0]个字节。

将唯一元素n的数量存储到*newsize，然后返回result。

哪一个是更好的方法，取决于result集的使用方式，以及它是否按排序顺序有用。如果排序顺序很有用，或者速度很重要并且顺序无关紧要，那么排序方法可能会更好。

（排序方法的效率取决于排序函数的效率。已知许多排序函数具有 O（大小×日志大小）时间复杂度;对于真正大量的数据，可以使用 O（大小）基数排序（因为值的数量是事先已知的）。请注意，基数排序只会超过非常大的size的其他排序，通常数百万。）

在某些情况下，result集的顺序与vec的顺序可能很重要，但删除了重复项。然后，第二种方法是显而易见的选择。它的时间复杂度是 O（size×n），这意味着它会减慢数组和唯一元素集的大小。

Answer 3

由于您对矢量进行了排序，因此应该很容易。迭代向量..并复制值，如果它们与先前的值不匹配。这样的事情（未经验证的可编译代码）：

size_t src;
size_t dst;

for (src = 0, dst = 0; src < size; src++)
{
    // skip check for first element; compare with previous and if they are the same just move on
    if (src > 0 && vec[src] == vec[src - 1])
        continue;

    tmp[dst] = vec[src];
    dst++;
}

从c

3 个答案: