基本上,我编写了一个程序,在给定一组输出的情况下,程序计算出一个通过遗传编程提供这些输出的公式。在程序中,我有一个函数,在给定一组样本(适应度数据和目标数据)的情况下,将一组数据(输入和输出)随机分成训练数据和测试数据。该函数的工作方式是将数据划分为四个单独的数组,training_cases,test_cases,training_targets和test_targets。 Training_cases和test_cases是包含输入的双数组,而training_targets和test_targets是包含输出的单个数组。 这是功能:
struct csv_data *get_test_and_train_data(char *file_name, double split) {
double ***exemplars = parse_exemplars(file_name);
double **fitness = exemplars[0];
double *targs = *exemplars[1];
// Get lengths of the arrays.
int fitness_len = get_2d_arr_length(fitness);
int targs_len = get_double_arr_length(targs);
int col_size = get_double_arr_length(fitness[0]);
// randomize the index order
int fits_split_i = (int)(floor(fitness_len * split));
int *fits_rand_idxs = random_indexes(fitness_len);
// Split the cases and targets up according to the index at which to split.
// Leave space for NULL/NAN at the end.
double **training_cases = malloc((sizeof(double *) * fits_split_i) + 1);
double **test_cases = malloc((sizeof(double *) * (fitness_len - fits_split_i)) + 1);
double *training_targets = malloc((sizeof(double) * fits_split_i) + 1);
double *test_targets = malloc(sizeof(double) * (targs_len - fits_split_i) + 1);
// Allocate the inner arrays.
for (int i = 0; i < fits_split_i; i++) {
training_cases[i] = malloc(sizeof(double) * col_size);
if (i >= fitness_len) {
test_cases[i - fits_split_i] = malloc(sizeof(double) * col_size);
}
}
int rand_i;
// Split the fitness and target data into training and test cases.
for (int i = 0; i < fitness_len; i++) {
rand_i = fits_rand_idxs[i];
if (i >= fits_split_i) {
test_cases[i - fits_split_i] = fitness[rand_i];
test_targets[i - fits_split_i] = targs[rand_i]; // line 636
} else {
training_cases[i] = fitness[rand_i];
training_targets[i] = targs[rand_i]; // line 639
}
}
// Set last index to NULL/NAN to allow for easier looping of arrays
training_cases[fits_split_i] = NULL; // line 645
test_cases[fitness_len - fits_split_i] = NULL; // line 646
training_targets[fits_split_i] = NAN; // line 647
test_targets[targs_len - fits_split_i] = NAN; // line 648
问题是我遇到了多个错误(写入和未初始化的值错误)。 这是valgrind的输出:
==5049== Use of uninitialised value of size 8
==5049== at 0x4053A4: get_test_and_train_data (util.c:639)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049== Uninitialised value was created by a stack allocation
==5049== at 0x405161: get_test_and_train_data (util.c:599)
==5049==
==5049== Use of uninitialised value of size 8
==5049== at 0x405343: get_test_and_train_data (util.c:636)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049== Uninitialised value was created by a stack allocation
==5049== at 0x405161: get_test_and_train_data (util.c:599)
==5049==
==5049== Invalid write of size 8
==5049== at 0x4053D0: get_test_and_train_data (util.c:645)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049== Address 0x5593eb0 is 672 bytes inside a block of size 673 alloc'd
==5049== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049== by 0x4051F1: get_test_and_train_data (util.c:614)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049==
==5049== Invalid write of size 8
==5049== at 0x4053EE: get_test_and_train_data (util.c:646)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049== Address 0x5594028 is 296 bytes inside a block of size 297 alloc'd
==5049== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049== by 0x40520D: get_test_and_train_data (util.c:615)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049==
==5049== Invalid write of size 8
==5049== at 0x405411: get_test_and_train_data (util.c:647)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049== Address 0x5594310 is 672 bytes inside a block of size 673 alloc'd
==5049== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049== by 0x405226: get_test_and_train_data (util.c:616)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049==
==5049== Invalid write of size 8
==5049== at 0x405434: get_test_and_train_data (util.c:648)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049== Address 0x5594488 is 296 bytes inside a block of size 297 alloc'd
==5049== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==5049== by 0x405242: get_test_and_train_data (util.c:617)
==5049== by 0x4027BE: setup (pony_gp.c:740)
==5049== by 0x40286C: main (pony_gp.c:774)
==5049==
我的猜测是,大多数错误都是由于函数开头的分配不当造成的。我已经测试了所有其他函数,以确保它们返回正确的值。
任何帮助都将不胜感激。
编辑1
Christoph Freundl消除了所有的写入错误,所以现在我有未初始化的错误需要修复。我觉得parse_exemplars()导致了它们,所以这里是parse_exemplars:
/**
* Parse a CSV file. Parse the fitness case and split the data into
* test and train data. in the fitness case file each row is an exemplar
* and each dimension is in a column. The last column is the target value
* of the exemplar. The function returns a third degree pointer with the
* fitness data as the first element and the targets as the second element.
* The fitness data is structured as a 2D array and the target data is
* represented as a one dimensional array.
* file_name: Name of CSV file with a header.
*/
double ***parse_exemplars(char *file_name) {
csv_reader *reader = init_csv(file_name, ',');
double **fitness_cases, *targets;
int num_columns = get_num_column(reader);
int num_lines = get_num_lines(reader);
// leave space for NULL
fitness_cases = malloc(sizeof(double *) * num_lines);
for (int i = 0; i < num_lines; i++) {
fitness_cases[i] = malloc(sizeof(double) * num_columns);
}
// leave space for NAN
targets = malloc(sizeof(double) * (num_lines));
csv_line *row;
int f_i = 0;
int t_i = 0;
// Ignore the header
next_line(reader);
// Loop through to get target and fitness values.
while ((row = readline(reader))) {
int i;
for (i = 0; i < num_columns; i++) {
if (i == num_columns - 1) { // Last element of array is the target/desired output.
targets[t_i++] = atof(row->content[i]);
}
else {
// The arguments/inputs.
fitness_cases[f_i][i] = atof(row->content[i]);
}
}
// take the [i-1]th index because fitness cases has [num_columns-1] elements.
fitness_cases[f_i][i-1] = (double)NAN;
f_i++;
}
// Set last index to NULL/NAN for easier looping.
fitness_cases[f_i] = NULL;
targets[t_i] = (double)NAN;
// Wrap the fitness cases and targets in a 3rd degree pointer
double ***results = malloc(sizeof(double **) * 2);
double *tmp[] = { targets };
results[0] = fitness_cases;
results[1] = tmp;
free(row);
free(reader);
return results;
}
答案 0 :(得分:1)
对于数组training_cases
,test_cases
,training_targets
和test_targets
的最后一个元素,只分配了一个字节。但是,这些都可以作为double
(8个字节)或double *
(由于64位架构再次为8个字节)进行访问:在第645-648行的赋值中{{1 }}和NULL
值被隐式转换。因此,这些分配会导致&#34;无效写入&#34;错误。
更改例如NAN
到
training_cases
和其他分配类似,你应该没事。
double **training_cases = malloc((sizeof(double *) * (fits_split_i + 1));
中存在错误:parse_examplars()
收到本地声明的数组results[1]
,该数组在离开函数时变为无效。
我的建议:定义一个
tmp
并使用此类型的变量代替struct exemplars {
double** fitness_cases;
double* targets;
}
。