使用regex.h时内存泄漏?

时间:2014-09-03 15:13:41

标签: c++ regex memory-leaks

最小代码示例如下:

#include <cstdlib>
#include <iostream>
#include <vector>
#include <regex.h>

using namespace std;

class regex_result {
public:
    /** Contains indices of starting positions of matches.*/
    std::vector<int> positions;
    /** Contains lengths of matches.*/
    std::vector<int> lengths;
};

regex_result match_regex(string regex_string, const char* string) {
    regex_result result;
    regex_t* regex = new regex_t;
    regcomp(regex, regex_string.c_str(), REG_EXTENDED);
    /* "P" is a pointer into the string which points to the end of the
       previous match. */
    const char* pointer = string;
    /* "n_matches" is the maximum number of matches allowed. */
    const int n_matches = 10;
    regmatch_t matches[n_matches];
    int nomatch = 0;
    while (!nomatch) {
        nomatch = regexec(regex, pointer, n_matches, matches, 0);
        if (nomatch)
            break;
        for (int i = 0; i < n_matches; i++) {
            int start,
                finish;
            if (matches[i].rm_so == -1) {
                break;
            }
            start = matches[i].rm_so + (pointer - string);
            finish = matches[i].rm_eo + (pointer - string);
            result.positions.push_back(start);
            result.lengths.push_back(finish - start);
        }
        pointer += matches[0].rm_eo;
    }
    delete regex;
    return result;
}

int main(int argc, char** argv) {
    string str = "this is a test";
    string pat = "this";
    regex_result res = match_regex(pat, str.c_str());
    cout << res.positions.size() << endl;
    return 0;
}

所以我编写了一个函数来解析给定字符串的正则表达式匹配。结果保存在一个基本上是两个向量的类中,一个用于匹配的位置,另一个用于相应的匹配长度。

这样可以正常工作,但是当我在它上面运行valgrind时,它会显示一些实质性的内存泄漏。

在上面的代码中使用valgrind --leak-check=full时,我得到:

==24843== Memcheck, a memory error detector
==24843== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==24843== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
==24843== Command: ./test
==24843== 
1
==24843== 
==24843== HEAP SUMMARY:
==24843==     in use at exit: 11,688 bytes in 37 blocks
==24843==   total heap usage: 54 allocs, 17 frees, 12,868 bytes allocated
==24843== 
==24843== 256 bytes in 1 blocks are definitely lost in loss record 14 of 18
==24843==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24843==    by 0x543549A: regcomp (regcomp.c:487)
==24843==    by 0x400ED0: match_regex(std::string, char const*) (in <path>)
==24843==    by 0x4010CA: main (in <path>)
==24843== 
==24843== 11,432 (224 direct, 11,208 indirect) bytes in 1 blocks are definitely lost in     loss record 18 of 18
==24843==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24843==    by 0x4C2CF1F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==24843==    by 0x5434BAF: re_compile_internal (regcomp.c:760)
==24843==    by 0x54354FF: regcomp (regcomp.c:506)
==24843==    by 0x400ED0: match_regex(std::string, char const*) (in <path>)
==24843==    by 0x4010CA: main (in <path>)
==24843== 
==24843== LEAK SUMMARY:
==24843==    definitely lost: 480 bytes in 2 blocks
==24843==    indirectly lost: 11,208 bytes in 35 blocks
==24843==      possibly lost: 0 bytes in 0 blocks
==24843==    still reachable: 0 bytes in 0 blocks
==24843==         suppressed: 0 bytes in 0 blocks
==24843== 
==24843== For counts of detected and suppressed errors, rerun with: -v
==24843== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

我的代码是错误的还是这些文件中确实存在错误?

2 个答案:

答案 0 :(得分:5)

您的regex_t管理层不需要是动态的,虽然这与您的问题没有直接关系,但这有点奇怪。真正的问题是,如果编译成功,你永远不会regfree()你的结果表达 (你应该验证)。您应该像这样设置正则表达式:

regex_t regex;
int res = regcomp(&regex, regex_string.c_str(), REG_EXTENDED);
if (res == 0)
{
    // use your expression via &regex
    ....

    // and eventually free it when done.
    regfree(&regex);
}

如果你的实现支持它们,我强烈建议使用C ++ 11提供的<regex>库,因为它有很好的RAII解决方案。

答案 1 :(得分:3)

您必须致电regfree()以释放由regcomp()分配的内存。