我有一个带有循环的函数find_nodes():
for (htmlNodePtr current_node=root_node
; current_node!=NULL
; current_node=current_node->next)
{
if (xmlHasProp(current_node,(xmlChar *)"href")) {
if (xmlHasProp(current_node,(xmlChar *)attribute)) {
if (strcmp(value,
(char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
found_nodes[numb_found]=current_node;
numb_found++;
}
}
}
find_nodes(found_nodes,numb_found,
current_node->children,mode,attribute,value);
}
我在这项任务中遇到了分段错误:
found_nodes[numb_found]=current_node;
我检查了numb_found值,它可以进行几次迭代,之后不是少数+ 1,它等于-1207604106
可能导致什么?
答案 0 :(得分:3)
你以某种方式超越阵列边界并查看随机数据。
好的,看看这个,我们没有足够的信息,但我发现这似乎是通过DOM树的递归搜索。您将numb_found
作为参数传递,因此当您在递归调用中为其分配时,该值不会在那里更新。最终你会遇到麻烦。
答案 1 :(得分:2)
你声明但不是初始化
char **url_list;
然后你用它
if (tree_is_true(l_list)) {
url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
numb_found++;
}
-1207604106是0xB8056C76 - 非常适合指针; - )
答案 2 :(得分:1)
你踩了记忆。使用-g
编译代码并使用valgrind
运行代码,valgrind
将告诉您错误的确切位置。
答案 3 :(得分:0)
从给定的代码中猜测,要么你有一些东西正在踩到堆栈,要么numb_found变得非常大并且溢出。输入一些真实的代码(比如上面所有的类型信息),我们将能够告诉你更多。
我怀疑found_nodes是本地堆栈上的一个固定大小的数组,并且你也在运行它。
答案 4 :(得分:0)
也许我误解了你的代码,但是因为你传递的是numb_found
而不是&numb_found
,所以每次从递归返回时,你只会重写找到的节点,这对我来说似乎是个错误。
答案 5 :(得分:0)
这是整个功能:
void find_nodes(htmlNodePtr *found_nodes, int &numb_found, htmlNodePtr root_node, SearchMode mode, const char *attribute, const char *value) {
htmlNodePtr tmp_ptr;
switch (mode) {
case S_HREF:
for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
if (xmlHasProp(current_node,(xmlChar *)"href")) {
if (xmlHasProp(current_node,(xmlChar *)attribute)) {
if (strcmp(value,(char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
found_nodes[numb_found]=current_node;
numb_found++;
}
}
}
find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
}
break;
case S_KEYWORD:
for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
if (xmlHasProp(current_node,(xmlChar *)"href")) {
if (strcmp(value,(char *)xmlNodeGetContent(current_node))==0) {
found_nodes[numb_found]=current_node;
numb_found++;
}
}
find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
}
break;
case S_TAG:
for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
if (xmlHasProp(current_node,(xmlChar *)attribute)) {
if (strcmp(value,(char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
tmp_ptr=inner_href_seek(current_node);
if (tmp_ptr==NULL) {
find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
continue;
}
else {
found_nodes[numb_found]=tmp_ptr;
numb_found++;
}
}
}
find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
}
break;
}
}
数组是固定大小的,但它比需要的大。我是否以适当的方式传递麻木?
===编辑===
char** get_urls(string url, ParseTreeNode *tree_root, int &numb_found) {
numb_found=0;
char **url_list;
htmlDocPtr doc;
htmlNodePtr root_node;
string site_content;
if (get_page(url,site_content)<0) {
url_list=NULL;
return url_list;
}
// get a DOM
doc=htmlReadMemory(site_content.data(),site_content.size(),url.data(),NULL,0);
// and the root
root_node=xmlDocGetRootElement(doc);
if (tree_root==NULL) {
url_list=NULL;
return url_list;
}
LeafList *l_list;
l_list= new LeafList();
l_list->numb_leafs=0;
get_leaf_list(l_list,tree_root);
htmlNodePtr matching_nodes[256];
int numb_matching_nodes;
htmlNodePtr tmp_nodes[64];
int numb_tmp;
SearchMode tmp_rule;
for (int i=0;i<l_list->numb_leafs;i++) {
if (l_list->leaf_buff[i]->data->rule!=TAG) continue;
else {
numb_matching_nodes=0;
find_nodes(matching_nodes,numb_matching_nodes,root_node,S_TAG,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());
if (numb_matching_nodes==0) continue;
else l_list->leaf_buff[i]->state=true;
for (int j=0;j<numb_matching_nodes;j++) {
for (int k=0;k<l_list->numb_leafs;k++) {
if (k==i) continue;
else {
switch(l_list->leaf_buff[k]->data->rule) {
case HREF:
tmp_rule=S_HREF;
break;
case TAG:
tmp_rule=S_TAG;
break;
case KEYWORD:
tmp_rule=S_KEYWORD;
break;
}
find_nodes(tmp_nodes,numb_tmp,matching_nodes[j],tmp_rule,l_list->leaf_buff[k]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());
if (numb_tmp>0) l_list->leaf_buff[k]->state=true;
else l_list->leaf_buff[k]->state=false;
}
}
if (tree_is_true(l_list)) {
url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
numb_found++;
}
}
}
}
for (int i=0;i<l_list->numb_leafs;i++) {
if (l_list->leaf_buff[i]->data->rule!=HREF) continue;
else {
numb_matching_nodes=0;
find_nodes(matching_nodes,numb_matching_nodes,root_node,S_HREF,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());
if (numb_matching_nodes==0) continue;
else l_list->leaf_buff[i]->state=true;
for (int j=0;j<numb_matching_nodes;j++) {
for (int k=0;k<l_list->numb_leafs;k++) {
if ((k==i)||(l_list->leaf_buff[k]->data->rule==TAG)) continue;
else {
switch(l_list->leaf_buff[k]->data->rule) {
case HREF:
tmp_rule=S_HREF;
break;
case KEYWORD:
tmp_rule=S_KEYWORD;
break;
}
find_nodes(tmp_nodes,numb_tmp,matching_nodes[j],tmp_rule,l_list->leaf_buff[k]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());
if (numb_tmp>0) l_list->leaf_buff[k]->state=true;
else l_list->leaf_buff[k]->state=false;
}
}
if (tree_is_true(l_list)) {
url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
numb_found++;
}
}
}
}
for (int i=0;i<l_list->numb_leafs;i++) {
if (l_list->leaf_buff[i]->data->rule!=KEYWORD) continue;
else {
numb_matching_nodes=0;
find_nodes(matching_nodes,numb_matching_nodes,root_node,S_KEYWORD,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());
if (numb_matching_nodes==0) continue;
else {
for (int i=0;i<numb_matching_nodes;i++) {
url_list[numb_found]=(char *)xmlGetProp(matching_nodes[i],(xmlChar *)"href");
numb_found++;
}
}
}
}
return url_list;
}
答案 6 :(得分:0)
我注意到numb_found
的无效值(-1207604106)是0xB8056C76,有点像指针值。可以通过超越阵列来解释 ,尽管你说它没有被超限......
我建议您验证阵列确实“比需要的大”。在向阵列添加节点的行上添加跟踪(使用cerr
);至少要让痕迹每次打印出numb_found
的值。在崩溃之前你获得的最大价值是多少?这实际上与数组大小相比如何?
答案 7 :(得分:0)
这是valgrind返回的内容
==7464==
==7464== Use of uninitialised value of size 4
==7464== at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464== by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464== by 0x804907B: main (tester.cpp:39)
==7464==
==7464== Invalid write of size 4
==7464== at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464== by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464== by 0x804907B: main (tester.cpp:39)
==7464== Address 0xcef11ec0 is not stack'd, malloc'd or (recently) free'd
==7464==
==7464== Process terminating with default action of signal 11 (SIGSEGV)
==7464== Access not within mapped region at address 0xCEF11EC0
==7464== at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464== by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464== by 0x804907B: main (tester.cpp:39)
==7464==
==7464== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 18 from 2)
==7464== malloc/free: in use at exit: 73,605 bytes in 3,081 blocks.
==7464== malloc/free: 3,666 allocs, 585 frees, 172,154 bytes allocated.
==7464== For counts of detected errors, rerun with: -v
==7464== searching for pointers to 3,081 not-freed blocks.
==7464== checked 329,952 bytes.
==7464==
==7464== LEAK SUMMARY:
==7464== definitely lost: 186 bytes in 30 blocks.
==7464== possibly lost: 8,324 bytes in 6 blocks.
==7464== still reachable: 65,095 bytes in 3,045 blocks.
==7464== suppressed: 0 bytes in 0 blocks.
==7464== Rerun with --leak-check=full to see details of leaked memory.
我得到的最大值是24,数组有1024个元素
答案 8 :(得分:0)
说真的,你不应该问我们这个问题。你要快得多:
numb_found
;或numb_found
的printf / cout(使用唯一ID:printf("A:%d\n,numb_found);
)。这将是查看究竟是什么导致问题的最快捷方式。
在任何情况下,您的最新答案/评论表明您将未初始化的值传递给find_nodes()
,并且鉴于其中大部分都是指针,这也会导致您写入无效内存。
我无法从valgrind输出判断哪个参数未初始化,因此将printf / cout放在函数顶部以打印出所有指针(不指针的内容)。这应该允许您查看哪个参数不好或已损坏。