奇怪的整数值导致分段错误

时间:2009-05-01 22:57:35

标签: c++ integer segmentation-fault

我有一个带有循环的函数find_nodes():

for (htmlNodePtr current_node=root_node
    ; current_node!=NULL
    ; current_node=current_node->next) 
{
    if (xmlHasProp(current_node,(xmlChar *)"href")) {
        if (xmlHasProp(current_node,(xmlChar *)attribute)) {
            if (strcmp(value,
                (char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
                    found_nodes[numb_found]=current_node;
                    numb_found++;
            }
        }
    }

    find_nodes(found_nodes,numb_found,
               current_node->children,mode,attribute,value);

}

我在这项任务中遇到了分段错误:

found_nodes[numb_found]=current_node;

我检查了numb_found值,它可以进行几次迭代,之后不是少数+ 1,它等于-1207604106

可能导致什么?

9 个答案:

答案 0 :(得分:3)

你以某种方式超越阵列边界并查看随机数据。

好的,看看这个,我们没有足够的信息,但我发现这似乎是通过DOM树的递归搜索。您将numb_found作为参数传递,因此当您在递归调用中为其分配时,该值不会在那里更新。最终你会遇到麻烦。

答案 1 :(得分:2)

你的get_urls函数中的

你声明但不是初始化

char **url_list;

然后你用它

if (tree_is_true(l_list)) {
    url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
    numb_found++;
}

-1207604106是0xB8056C76 - 非常适合指针; - )

答案 2 :(得分:1)

你踩了记忆。使用-g编译代码并使用valgrind运行代码,valgrind将告诉您错误的确切位置。

答案 3 :(得分:0)

从给定的代码中猜测,要么你有一些东西正在踩到堆栈,要么numb_found变得非常大并且溢出。输入一些真实的代码(比如上面所有的类型信息),我们将能够告诉你更多。

我怀疑found_nodes是本地堆栈上的一个固定大小的数组,并且你也在运行它。

答案 4 :(得分:0)

也许我误解了你的代码,但是因为你传递的是numb_found而不是&numb_found,所以每次从递归返回时,你只会重写找到的节点,这对我来说似乎是个错误。

答案 5 :(得分:0)

这是整个功能:

void find_nodes(htmlNodePtr *found_nodes, int &numb_found, htmlNodePtr root_node, SearchMode mode, const char *attribute, const char *value) {

    htmlNodePtr tmp_ptr;

    switch (mode) {

        case S_HREF:
            for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
                if (xmlHasProp(current_node,(xmlChar *)"href")) {
                    if (xmlHasProp(current_node,(xmlChar *)attribute)) {
                        if (strcmp(value,(char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
                            found_nodes[numb_found]=current_node;
                            numb_found++;
                        }
                    }
                }

                find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);

            }
            break;
        case S_KEYWORD:
            for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
                if (xmlHasProp(current_node,(xmlChar *)"href")) {
                    if (strcmp(value,(char *)xmlNodeGetContent(current_node))==0) {
                        found_nodes[numb_found]=current_node;
                        numb_found++;
                    }
                }

                find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
            }
            break;
        case S_TAG:
            for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
                if (xmlHasProp(current_node,(xmlChar *)attribute)) {
                    if (strcmp(value,(char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
                        tmp_ptr=inner_href_seek(current_node);
                        if (tmp_ptr==NULL) {
                            find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
                            continue;
                        }
                        else {
                            found_nodes[numb_found]=tmp_ptr;
                            numb_found++;
                        }
                    }
                }

                find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
            }
            break;
    }
}

数组是固定大小的,但它比需要的大。我是否以适当的方式传递麻木?

===编辑===

char** get_urls(string url, ParseTreeNode *tree_root, int &numb_found) {

    numb_found=0;
    char **url_list;
    htmlDocPtr doc;
    htmlNodePtr root_node;
    string site_content;

    if (get_page(url,site_content)<0) {
        url_list=NULL;
        return url_list;
    }

    // get a DOM
    doc=htmlReadMemory(site_content.data(),site_content.size(),url.data(),NULL,0);

    // and the root
    root_node=xmlDocGetRootElement(doc);

    if (tree_root==NULL) {
        url_list=NULL;
        return url_list;
    }

    LeafList *l_list;
    l_list= new LeafList();

    l_list->numb_leafs=0;

    get_leaf_list(l_list,tree_root);

    htmlNodePtr matching_nodes[256];
    int numb_matching_nodes;

    htmlNodePtr tmp_nodes[64];
    int numb_tmp;

    SearchMode tmp_rule;

    for (int i=0;i<l_list->numb_leafs;i++) {
        if (l_list->leaf_buff[i]->data->rule!=TAG) continue;
        else {
            numb_matching_nodes=0;
            find_nodes(matching_nodes,numb_matching_nodes,root_node,S_TAG,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

            if (numb_matching_nodes==0) continue;
            else l_list->leaf_buff[i]->state=true;

            for (int j=0;j<numb_matching_nodes;j++) {
                for (int k=0;k<l_list->numb_leafs;k++) {
                    if (k==i) continue;
                    else {
                        switch(l_list->leaf_buff[k]->data->rule) {
                            case HREF:
                                tmp_rule=S_HREF;
                                break;
                            case TAG:
                                tmp_rule=S_TAG;
                                break;
                            case KEYWORD:
                                tmp_rule=S_KEYWORD;
                                break;
                        }

                        find_nodes(tmp_nodes,numb_tmp,matching_nodes[j],tmp_rule,l_list->leaf_buff[k]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

                        if (numb_tmp>0) l_list->leaf_buff[k]->state=true;
                        else l_list->leaf_buff[k]->state=false;
                    }
                }

                if (tree_is_true(l_list)) {
                    url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
                    numb_found++;
                }
            }
        }
    }

    for (int i=0;i<l_list->numb_leafs;i++) {
        if (l_list->leaf_buff[i]->data->rule!=HREF) continue;
        else {
            numb_matching_nodes=0;
            find_nodes(matching_nodes,numb_matching_nodes,root_node,S_HREF,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

            if (numb_matching_nodes==0) continue;
            else l_list->leaf_buff[i]->state=true;

            for (int j=0;j<numb_matching_nodes;j++) {
                for (int k=0;k<l_list->numb_leafs;k++) {
                    if ((k==i)||(l_list->leaf_buff[k]->data->rule==TAG)) continue;
                    else {
                        switch(l_list->leaf_buff[k]->data->rule) {
                            case HREF:
                                tmp_rule=S_HREF;
                                break;
                            case KEYWORD:
                                tmp_rule=S_KEYWORD;
                                break;
                        }

                        find_nodes(tmp_nodes,numb_tmp,matching_nodes[j],tmp_rule,l_list->leaf_buff[k]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

                        if (numb_tmp>0) l_list->leaf_buff[k]->state=true;
                        else l_list->leaf_buff[k]->state=false;
                    }
                }

                if (tree_is_true(l_list)) {
                    url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
                    numb_found++;
                }
            }
        }
    }

    for (int i=0;i<l_list->numb_leafs;i++) {
        if (l_list->leaf_buff[i]->data->rule!=KEYWORD) continue;
        else {
            numb_matching_nodes=0;
            find_nodes(matching_nodes,numb_matching_nodes,root_node,S_KEYWORD,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

            if (numb_matching_nodes==0) continue;
            else {
                for (int i=0;i<numb_matching_nodes;i++) {
                    url_list[numb_found]=(char *)xmlGetProp(matching_nodes[i],(xmlChar *)"href");
                    numb_found++;
                }
            }
        }
    }

    return url_list;
}

答案 6 :(得分:0)

我注意到numb_found的无效值(-1207604106)是0xB8056C76,有点像指针值。可以通过超越阵列来解释 ,尽管你说它没有被超限......

我建议您验证阵列确实“比需要的大”。在向阵列添加节点的行上添加跟踪(使用cerr);至少要让痕迹每次打印出numb_found的值。在崩溃之前你获得的最大价值是多少?这实际上与数组大小相比如何?

答案 7 :(得分:0)

这是valgrind返回的内容

==7464==
==7464== Use of uninitialised value of size 4
==7464==    at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464==    by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464==    by 0x804907B: main (tester.cpp:39)
==7464==
==7464== Invalid write of size 4
==7464==    at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464==    by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464==    by 0x804907B: main (tester.cpp:39)
==7464==  Address 0xcef11ec0 is not stack'd, malloc'd or (recently) free'd
==7464==
==7464== Process terminating with default action of signal 11 (SIGSEGV)
==7464==  Access not within mapped region at address 0xCEF11EC0
==7464==    at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464==    by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464==    by 0x804907B: main (tester.cpp:39)
==7464==
==7464== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 18 from 2)
==7464== malloc/free: in use at exit: 73,605 bytes in 3,081 blocks.
==7464== malloc/free: 3,666 allocs, 585 frees, 172,154 bytes allocated.
==7464== For counts of detected errors, rerun with: -v
==7464== searching for pointers to 3,081 not-freed blocks.
==7464== checked 329,952 bytes.
==7464==
==7464== LEAK SUMMARY:
==7464==    definitely lost: 186 bytes in 30 blocks.
==7464==      possibly lost: 8,324 bytes in 6 blocks.
==7464==    still reachable: 65,095 bytes in 3,045 blocks.
==7464==         suppressed: 0 bytes in 0 blocks.
==7464== Rerun with --leak-check=full to see details of leaked memory.

我得到的最大值是24,数组有1024个元素

答案 8 :(得分:0)

说真的,你不应该问我们这个问题。你要快得多:

  • 通过调试器运行它,单步执行代码并在每条指令后监视numb_found;或
  • 如果没有调试器,请在每个语句后添加numb_found的printf / cout(使用唯一ID:printf("A:%d\n,numb_found);)。

这将是查看究竟是什么导致问题的最快捷方式。

在任何情况下,您的最新答案/评论表明您将未初始化的值传递给find_nodes(),并且鉴于其中大部分都是指针,这也会导致您写入无效内存。

我无法从valgrind输出判断哪个参数未初始化,因此将printf / cout放在函数顶部以打印出所有指针(指针的内容)。这应该允许您查看哪个参数不好或已损坏。