以下是触发此错误的代码部分(这是包含大约1000行并且旨在实现超级切割算法的更大文件的一部分):
void get_combination_cuts_characteristics(uint32_t* cuts,
uint32_t nb_dim_cut,
struct hypercuts_dimension** dimensions,
uint32_t* return_cuts,
uint32_t sum_cuts,
struct classifier_rule** rules,
uint32_t nb_rules,
uint32_t* children_rules_sum,
uint32_t* max_rules)
{
// Array of children
uint32_t nb_children = (uint32_t) 0x1 << sum_cuts;
uint32_t* min_index = chkmalloc(sizeof(*min_index) * nb_dim_cut);
uint32_t* max_index = chkmalloc(sizeof(*max_index) * nb_dim_cut);
uint32_t* current_index = chkmalloc(sizeof(*current_index) * nb_dim_cut);
uint32_t children_array[nb_children];
for (uint32_t i = 0; i < nb_children; ++i)
children_array[i] = 0;
// For each rules we compute the number of rule each child get.
uint32_t min_value;
uint32_t max_value;
uint32_t nb_cuts;
uint32_t subregion_size;
uint32_t index;
for (uint32_t i = 0; i < nb_rules; ++i)
{
for (uint32_t j = 0; j < nb_dim_cut; ++j)
{
min_value = rules[i]->statements[dimensions[j]->id]->value;
max_value = rules[i]->statements[dimensions[j]->id]->value | rules[i]->statements[dimensions[j]->id]->mask;
nb_cuts = (uint32_t)0x1 << cuts[j];
subregion_size = (dimensions[j]->max_dim - dimensions[j]->min_dim) + 1;
subregion_size = subregion_size / nb_cuts;
if(subregion_size == 0)
continue;
// Fit the interval in the region of the dimension
if(min_value < dimensions[j]->min_dim)
min_value = dimensions[j]->min_dim;
if(max_value > dimensions[j]->max_dim)
max_value = dimensions[j]->max_dim;
// Compute the minimal and maximal index of the rule in this dimension
min_index[j] = (min_value - dimensions[j]->min_dim) / subregion_size;
max_index[j] = (max_value - dimensions[j]->min_dim) / subregion_size;
current_index[j] = min_index[j];
}
// Locate the first child
index = get_multi_dimension_index(min_index, nb_dim_cut, cuts);
children_array[index] ++;
// Locate all the other children that the rule span
while(get_next_dimension_index(current_index, min_index, max_index, nb_dim_cut))
{
index = get_multi_dimension_index(current_index, nb_dim_cut, cuts);
children_array[index]++;
}
}
// Set the return variables
uint32_t rules_sum = 0;
uint32_t max_rule_child = 0;
for (uint32_t i = 0; i < nb_children; ++i)
{
rules_sum += children_array[i];
if(max_rule_child < children_array[i])
max_rule_child = children_array[i];
}
if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum)))
{
*max_rules = max_rule_child;
*children_rules_sum = rules_sum;
for (uint32_t i = 0; i < nb_dim_cut; ++i)
return_cuts[i] = cuts[i];
}
free(min_index);
free(max_index);
free(current_index);
}
gdb告诉我,我在第rules_sum += children_array[i];
行遇到了一个段错误,所以看起来我在阵列上走得太远而且检查了我的代码。但问题是,当我打印单元格时,gdb尝试访问它是很好的(给我我期望的值)。然后我试图发现指针是否可能是原因,但它们都在gdb中打印正常。我用valgrind运行程序,然后它给我if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum)))
行的段错误。我还测试了这些语句的变量/指针,它们也打印得很好。所以我想知道我是否可以有堆栈溢出,所以我为valgrind分配了一个2GB的堆栈,并在堆上分配了该函数的数组,但它导致了同样的问题。
另一个棘手的问题是,如果我在for循环之前放置一个fprint,一个接一个,一个在内部我运行正常......
这是valgrind给我的:
Invalid read of size 4
==8397== at 0x4017EE: get_combination_cuts_characteristics (hypercuts.c:775)
==8397== by 0x401947: get_optimal_cut_combination (hypercuts.c:663)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x401B78: set_nb_cuts (hypercuts.c:485)
==8397== by 0x4025B2: build_node (hypercuts.c:219)
==8397== by 0x40273D: build_node (hypercuts.c:285)
==8397== by 0x4029B2: new_hypercuts_classifier (hypercuts.c:143)
==8397== by 0x403B02: main (hypercuts_test.c:277)
==8397== Address 0x11fefff77a is not stack'd, malloc'd or (recently) free'd
==8397==
==8397==
==8397== Process terminating with default action of signal 11 (SIGSEGV)
==8397== Access not within mapped region at address 0x11FEFFF77A
==8397== at 0x4017EE: get_combination_cuts_characteristics (hypercuts.c:775)
==8397== by 0x401947: get_optimal_cut_combination (hypercuts.c:663)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x40198F: get_optimal_cut_combination (hypercuts.c:678)
==8397== by 0x401B78: set_nb_cuts (hypercuts.c:485)
==8397== by 0x4025B2: build_node (hypercuts.c:219)
==8397== by 0x40273D: build_node (hypercuts.c:285)
==8397== by 0x4029B2: new_hypercuts_classifier (hypercuts.c:143)
==8397== by 0x403B02: main (hypercuts_test.c:277)
我没有想法,我来这里寻求帮助,可以给我提示或新想法。这个函数由另一个递归函数调用(build_node:在分段错误的情况下,我正在谈论4个递归调用,因此不会太多)并且在它发生故障之前执行了3次。这让我觉得堆栈(指针或数组)有些乱七八糟的东西,但我没有找到工具来分析堆栈,我多次检查了那部分代码。
提供有关该部分代码的一些细节:这意味着对要在多维空间中执行的切割数量执行线性优化。该特定功能给出了所执行的切割的特性,并且在每个优化步骤结束时执行。
提前致谢!!
答案 0 :(得分:1)
这应该很容易调试。
我首先调查valgrind崩溃,因为它往往更精确。行if(max_rule_child < *max_rules || ((max_rule_child == *max_rules) && (rules_sum < *children_rules_sum)))
有两个指针被解除引用。他们中的一个或多个肯定是垃圾。仔细检查max_rules,children_rules_sum指向有效地址。添加调试语句并查看值是否更改。
另一行rules_sum += children_array[i];
似乎也有可能。似乎没有检查index
是否小于nb_children
。使用相同的策略并添加两个值的一些调试语句。超过数组末尾的写操作将破坏堆栈。堆栈损坏可能会覆盖children_rules_sum或max_rules,从而导致valgrind崩溃。