Question

我使用动态分配的数组编写了一个C程序来合并排序（递归）整数。它可以在高达100k的整数下正常工作，但是当我输入100万个整数时，它会抛出Segmentation fault (core dumped)错误。

为什么这样做？我的16GB内存不够好吗？如果我没有使用动态分配的数组，是否可以对更大数量的整数进行排序？

动态分配如何正常工作？根据我的理解，当声明动态变量或动态数组中的元素时，将一部分内存（RAM？）放在一边并设置为严格存储声明的变量，直到释放内存为止。

当我的程序试图预留内存以容纳一百万个整数时，它是否因为没有足够的可用内存而失败？

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define BIL 1E9

//struct Sort allows dynamic allocation of the array used in insertion sort.
typedef struct {
    int *arr; //pointer to the dynamic array
    size_t used; //stores number of 'used' elements in the array
    size_t size; //stores number of total elements
} Sort;

//function prototypes to interact with the dynamic array
void freeSort(Sort *);
void initSort(Sort *, size_t);
void inSort(Sort *, int);

//prototypes for the Merge Sort
void mergeSort(Sort *, int, int, int []);
void merge(Sort *, int, int, int, int []);
void copyArray(int [], int, int, Sort *);



int main(){

    //declare Sort variable 'magic' to perform the magical insertion sort on the dynamic array.
    Sort magic;
    initSort(&magic, 10); //initialize magic with 10 elements

    //variables to allow the program to function
    int intin;
    char filename[15];

    //tosort is the file to sort.
    //sorted is the output file after sort.
    FILE *tosort, *sorted;

    //necessary variables to measure time
    struct timespec start, finish;

    //prompt user for file name.
    printf("Enter the name of file with a list of integers to sort: ");
    scanf("%s", filename);
    tosort = fopen(filename, "r"); //read 'tosort' file

    //write the 'sorted' file to 'filename.sorted'
    sorted = fopen(strcat(filename, ".sorted"), "w");

    //while loop stores every integer in the dynamically allocated magic array from tosort file.
    while (!feof(tosort)) {
        fscanf(tosort, "%d", &intin);
        inSort(&magic, intin);
    }

    //n stores number of integers to sort
    int n = magic.used;
    //temporary array for use with the merge sort
    int sortedArray [n];

    //measure time
    clock_gettime(CLOCK_REALTIME, &start);  //start

    //Merge Sort
    mergeSort(&magic, 0, n, sortedArray);

    clock_gettime(CLOCK_REALTIME, &finish); //finish

    //calculate the elapsed time in nanoseconds.
    double elapsed = (finish.tv_sec-start.tv_sec)+(finish.tv_nsec-start.tv_nsec)/BIL;

    printf("Merge Sort took %lf seconds\n", elapsed);

    //write the sorted array to 'sorted' ('filename'.sorted)
    for (int i = 0; i < n; i++) {
        fprintf(sorted, "%d\n", magic.arr[i]);
    }

    //free up the allocated memory for the Sort array and close the files.
    freeSort(&magic);
    fclose(tosort);
    fclose(sorted);

    return 0;
}

//initialize the dynamic array
void initSort(Sort *dynA, size_t initSize) {
    dynA->arr = (int *)malloc(initSize * sizeof(int));
    dynA->used = 0;
    dynA->size = initSize;
}

//add values to the elements of the dynamic array
void inSort(Sort *dynA, int val) {
    //if the array size is not big enough to fit new values, allocate 100 more elements.
    if (dynA->used == dynA->size) {
        dynA->size += 100;
        dynA->arr = (int *)realloc(dynA->arr, dynA->size * sizeof(int));
    }
    //'used' holds the number of used elements with values in the array.
    dynA->arr[dynA->used++] = val;
}

//free allocated memory for the dynamic array
void freeSort(Sort *dynA) {
  free(dynA->arr);
  dynA->arr = NULL;
  dynA->used = dynA->size = 0;
}


//split the array until size is 1
void mergeSort(Sort *dynA, int begin, int end, int tempA [])
{
    //if size is 1, done splitting.
    if(end-begin < 2)
        return;

    // recursively split the array
    int mid = (end+begin)/2; // mid = middle point
    mergeSort(dynA, begin, mid, tempA); // mergeSort left half
    mergeSort(dynA, mid, end, tempA); // mergeSort right half
    merge(dynA, begin, mid, end, tempA); // merge the two halves
    copyArray(tempA, begin, end, dynA); // copy the merged array to dynA
}

//merge the two arrays
void merge (Sort *dynA, int begin, int mid, int end, int tempA [])
{
    int i = begin; int j = mid;

    //from begin to end, compare the values of the two arrays
    for (int k = begin; k < end; k++)

        // store the smaller value into tempA[k]
        if (j >= end || (i < mid && dynA->arr[i] <= dynA->arr[j]))
            tempA[k] = dynA->arr[i++];
        else tempA[k] = dynA->arr[j++];
}

//copy the contents of the temporary array to the dynamic array
void copyArray(int tempA[], int begin, int end, Sort *dynA){
    for(int k = begin; k < end; k++)
        dynA->arr[k] = tempA[k];
}

当我提供一百万个整数进行排序时，Cygwin64和CommandPrompt会出现相同的错误。

Answer 1

您的错误是您正在使用基于大型堆栈的VLA sortedArray。使用1,000,000个值，您的阵列为4MB，并且由于堆栈溢出而导致段错误，因为阵列超出了预设的堆栈大小限制。

例如，在linux下，堆栈限制大约为8MB [在我的系统上，我不得不将阵列数增加到3,000,000以重现段错误]

变化：

    //temporary array for use with the merge sort
    int sortedArray [n];

分为：

    //temporary array for use with the merge sort
    int *sortedArray = malloc(sizeof(int) * n);

如果您愿意，可以选择在main的底部添加free(sortedArray)以便整理。

此外，您可以使用limit shell命令查看stacksize参数的设置。因此，解决问题的另一种方法是使用该命令来增加限制。但是，我不建议，因为它不是简单地使用malloc而是一个不那么通用且更脆弱的解决方案。

关于调试提示......

为了找到您的问题，我做了以下事情：

使用调试信息编译程序：gcc -o blah -g blah.c

使用gdb ./blah

调用调试器

使用run

运行[调试器内部]程序

我收到了下面的段错误信息：

Starting program: ...
Enter the name of file with a list of integers to sort: input2

Program received signal SIGSEGV, Segmentation fault.
0x00000000004009d1 in main () at blah.c:64
64      clock_gettime(CLOCK_REALTIME, &start);  //start

clock_gettime发生故障并没有多大意义，所以我看了上面的一行：

    int sortedArray [n];

Answer 2

你在找错了地方。你的堆内存很好，问题是你的堆栈。每次在C中调用函数时（除非它被编译或内联），编译器“保存”当前函数的所有局部变量（包括参数），然后调用下一个函数，以便在结束时函数的前一个函数准备就绪。这是堆栈。

通过递归调用一百万个值的mergesort，您不仅可以在堆上分配大约4MB，还可以在堆栈上分配sizeof(ALL_THE_STUFF_MERGE_SORT_NEEDS) * 1000000个字节。这几乎肯定会产生堆栈溢出。

尝试解开mergesort中的递归以改为使用循环。这样，您不必每次递归调用都“保存”函数的状态，并且可以重新使用相同的变量。（您可以谷歌搜索在线展开的mergesort，看看它是如何工作的）

编辑：我忘记了如果在一百万个整数上正确实现，mergesort只会调用自己20次，但这可能仍然是问题，或者至少与它有关，所以我会在这里保留我的答案。

Answer 3

您的阵列分配器似乎没问题。问题出在其他地方：

你使用mergesort的哪个实现？
你可以发布代码吗？
是为堆栈上的临时工作数组还是从堆分配内存？
如果此临时数组在堆栈上分配（自动存储），则很可能是您的问题的原因，因为在当前系统上，堆栈空间通常默认限制为几兆字节。

编辑：临时数组确实已在main()函数int sortedArray[n];中的自动存储（堆栈）中分配。使用malloc()分配此问题应解决此问题，但您还有其他问题，例如while (!feof(tosort))，这些问题总是错误的，如下所述：Why is “while ( !feof (file) )” always wrong?

使用动态分配的数组递归地对大量整数进行合并排序会引发错误

3 个答案: