Question

基本上，我编写了一个程序，在给定一组输出的情况下，程序计算出一个通过遗传编程提供这些输出的公式。在程序中，我有一个函数，在给定一组样本（适应度数据和目标数据）的情况下，将一组数据（输入和输出）随机分成训练数据和测试数据。该函数的工作方式是将数据划分为四个单独的数组，training_cases，test_cases，training_targets和test_targets。 Training_cases和test_cases是包含输入的双数组，而training_targets和test_targets是包含输出的单个数组。这是功能：

struct csv_data *get_test_and_train_data(char *file_name, double split) {
    double ***exemplars = parse_exemplars(file_name);
    double **fitness = exemplars[0];
    double *targs = *exemplars[1];

    // Get lengths of the arrays.
    int fitness_len = get_2d_arr_length(fitness);
    int targs_len =  get_double_arr_length(targs);
    int col_size = get_double_arr_length(fitness[0]);

    // randomize the index order
    int fits_split_i = (int)(floor(fitness_len * split));
    int *fits_rand_idxs = random_indexes(fitness_len);

    // Split the cases and targets up according to the index at which to split.
    // Leave space for NULL/NAN at the end.
    double **training_cases = malloc((sizeof(double *) * fits_split_i) + 1);
    double **test_cases = malloc((sizeof(double *) * (fitness_len - fits_split_i)) + 1);
    double *training_targets = malloc((sizeof(double) * fits_split_i) + 1);
    double *test_targets = malloc(sizeof(double) * (targs_len - fits_split_i) + 1);

    // Allocate the inner arrays.
    for (int i = 0; i < fits_split_i; i++) {
        training_cases[i] = malloc(sizeof(double) * col_size);

        if (i >= fitness_len) {
            test_cases[i - fits_split_i] = malloc(sizeof(double) * col_size);
        }
    }

    int rand_i;

    // Split the fitness and target data into training and test cases.
    for (int i = 0; i < fitness_len; i++) {
        rand_i = fits_rand_idxs[i];

        if (i >= fits_split_i) {
            test_cases[i - fits_split_i] = fitness[rand_i];
            test_targets[i - fits_split_i] = targs[rand_i]; // line 636
        } else {
            training_cases[i] = fitness[rand_i];
            training_targets[i] = targs[rand_i]; // line 639

        }
    }

    // Set last index to NULL/NAN to allow for easier looping of arrays
    training_cases[fits_split_i] = NULL; // line 645
    test_cases[fitness_len - fits_split_i] = NULL; // line 646
    training_targets[fits_split_i] = NAN; // line 647
    test_targets[targs_len - fits_split_i] = NAN; // line 648

问题是我遇到了多个错误（写入和未初始化的值错误）。这是valgrind的输出：

==5049== Use of uninitialised value of size 8
==5049==    at 0x4053A4: get_test_and_train_data (util.c:639)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Uninitialised value was created by a stack allocation
==5049==    at 0x405161: get_test_and_train_data (util.c:599)
==5049== 
==5049== Use of uninitialised value of size 8
==5049==    at 0x405343: get_test_and_train_data (util.c:636)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Uninitialised value was created by a stack allocation
==5049==    at 0x405161: get_test_and_train_data (util.c:599)
==5049== 
==5049== Invalid write of size 8
==5049==    at 0x4053D0: get_test_and_train_data (util.c:645)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Address 0x5593eb0 is 672 bytes inside a block of size 673 alloc'd
==5049==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049==    by 0x4051F1: get_test_and_train_data (util.c:614)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049== 
==5049== Invalid write of size 8
==5049==    at 0x4053EE: get_test_and_train_data (util.c:646)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Address 0x5594028 is 296 bytes inside a block of size 297 alloc'd
==5049==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049==    by 0x40520D: get_test_and_train_data (util.c:615)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049== 
==5049== Invalid write of size 8
==5049==    at 0x405411: get_test_and_train_data (util.c:647)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Address 0x5594310 is 672 bytes inside a block of size 673 alloc'd
==5049==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049==    by 0x405226: get_test_and_train_data (util.c:616)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049== 
==5049== Invalid write of size 8
==5049==    at 0x405434: get_test_and_train_data (util.c:648)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==  Address 0x5594488 is 296 bytes inside a block of size 297 alloc'd
==5049==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049==    by 0x405242: get_test_and_train_data (util.c:617)
==5049==    by 0x4027BE: setup (pony_gp.c:740)
==5049==    by 0x40286C: main (pony_gp.c:774)
==5049==

我的猜测是，大多数错误都是由于函数开头的分配不当造成的。我已经测试了所有其他函数，以确保它们返回正确的值。

任何帮助都将不胜感激。

编辑1

Christoph Freundl消除了所有的写入错误，所以现在我有未初始化的错误需要修复。我觉得parse_exemplars（）导致了它们，所以这里是parse_exemplars：

/**
* Parse a CSV file. Parse the fitness case and split the data into
* test and train data. in the fitness case file each row is an exemplar
* and each dimension is in a column. The last column is the target value
* of the exemplar. The function returns a third degree pointer with the
* fitness data as the first element and the targets as the second element.
* The fitness data is structured as a 2D array and the target data is
* represented as a one dimensional array.
*    file_name: Name of CSV file with a header.
*/
double ***parse_exemplars(char *file_name) {
    csv_reader *reader = init_csv(file_name, ',');

    double **fitness_cases, *targets;
    int num_columns = get_num_column(reader);
    int num_lines = get_num_lines(reader);

    // leave space for NULL
    fitness_cases = malloc(sizeof(double *) * num_lines);

    for (int i = 0; i < num_lines; i++) {
        fitness_cases[i] = malloc(sizeof(double) * num_columns);
    }

    // leave space for NAN
    targets = malloc(sizeof(double) * (num_lines));

    csv_line *row;
    int f_i = 0;
    int t_i = 0;

    // Ignore the header
    next_line(reader);

    // Loop through to get target and fitness values.
    while ((row = readline(reader))) {
        int i;
        for (i = 0; i < num_columns; i++) {
            if (i == num_columns - 1) { // Last element of array is the target/desired output.
                targets[t_i++] = atof(row->content[i]);
            }
            else {
                // The arguments/inputs.
                fitness_cases[f_i][i] = atof(row->content[i]);
            }
        }

        // take the [i-1]th index because fitness cases has [num_columns-1] elements.
        fitness_cases[f_i][i-1] = (double)NAN;
        f_i++;
    }

    // Set last index to NULL/NAN for easier looping.
    fitness_cases[f_i] = NULL;
    targets[t_i] = (double)NAN;

    // Wrap the fitness cases and targets in a 3rd degree pointer
    double ***results = malloc(sizeof(double **) * 2);
    double *tmp[] = { targets };
    results[0] = fitness_cases;
    results[1] = tmp;

    free(row);
    free(reader);

    return results;
}

Answer 1

无效写入

对于数组training_cases，test_cases，training_targets和test_targets的最后一个元素，只分配了一个字节。但是，这些都可以作为double（8个字节）或double *（由于64位架构再次为8个字节）进行访问：在第645-648行的赋值中{{1 }}和NULL值被隐式转换。因此，这些分配会导致＆＃34;无效写入＆＃34;错误。

更改例如NAN到

training_cases

和其他分配类似，你应该没事。

未初始化的值

double **training_cases = malloc((sizeof(double *) * (fits_split_i + 1));中存在错误：parse_examplars()收到本地声明的数组results[1]，该数组在离开函数时变为无效。

我的建议：定义一个

tmp

并使用此类型的变量代替struct exemplars { double** fitness_cases; double* targets; }。

编写和使用未初始化的错误Valgrind

1 个答案:

无效写入

未初始化的值