Question

我今天刚刚开始使用Boost :: regex，而且它也是正则表达式的新手。我一直在使用“The Regulator”和Expresso来测试我的正则表达式并且对我在那里看到的东西感到满意，但转移那个正则表达式来提升，似乎并没有按照我的意愿去做。任何帮助我解决方案的指针都会受到欢迎。作为一个附带问题，是否有任何工具可以帮助我测试我的正则表达式对boost.regex？

using namespace boost;
using namespace std;

vector<string> tokenizer::to_vector_int(const string s)
{
    regex re("\\d*");
    vector<string> vs;
    cmatch matches;
    if( regex_match(s.c_str(), matches, re) ) {
        MessageBox(NULL, L"Hmmm", L"", MB_OK); // it never gets here
        for( unsigned int i = 1 ; i < matches.size() ; ++i ) {
            string match(matches[i].first, matches[i].second);
            vs.push_back(match);
        }
    }
    return vs;
}

void _uttokenizer::test_to_vector_int() 
{
    vector<string> __vi = tokenizer::to_vector_int("0<br/>1");
    for( int i = 0 ; i < __vi.size() ; ++i ) INFO(__vi[i]);
    CPPUNIT_ASSERT_EQUAL(2, (int)__vi.size());//always fails
}

更新（感谢Dav帮助我澄清我的问题）：我希望得到一个带有2个字符串的向量=＆gt; “0”和“1”。我反而永远不会得到一个成功的regex_match（）（regex_match（）总是返回false）所以向量总是为空。

感谢'1800 INFORMATION'的建议。 to_vector_int()方法现在看起来像这样，但它进入一个永无止境的循环（我拿了你给的代码并修改它以使其可编辑）并找到“0”，“”，“”，“”和等等。它从未找到“1”。

vector<string> tokenizer::to_vector_int(const string s)
{
    regex re("(\\d*)");
    vector<string> vs;

    cmatch matches;

    char * loc = const_cast<char *>(s.c_str());
    while( regex_search(loc, matches, re) ) {
        vs.push_back(string(matches[0].first, matches[0].second));
        loc = const_cast<char *>(matches.suffix().str().c_str());
    }

    return vs;
}

老实说，我认为我还没有理解搜索模式和获得匹配的基础知识。是否有任何教程可以解释这一点？

Answer 1

基本问题是，当您使用regex_match时，您正在使用regex_search：

算法regex_search和 regex_match使用match_results 报告匹配的内容;区别这些算法之间是这样的 regex_match只会找到匹配项消耗所有输入文本， regex_search将搜索的位置在文本中的任何地方匹配匹配。

From the boost documentation。将其更改为使用regex_search，它将起作用。

此外，看起来你没有捕捉比赛。尝试将正则表达式更改为：

regex re("(\\d*)");

或者，您可能需要反复呼叫regex_search：

char *where = s.c_str();
while (regex_search(s.c_str(), matches, re))
{
  where = m.suffix().first;
}

这是因为你的正则表达式中只有一个捕获。

或者，如果您知道数据的基本结构，请更改正则表达式：

regex re("(\\d+).*?(\\d+)");

这将匹配搜索字符串中的两个数字。

请注意，正则表达式\ d *将匹配零个或多个数字 - 这包括空字符串“”，因为这正好是零数字。我会将表达式更改为\ d +，它将匹配1或更多。

在我的代码中，提升正则表达式无法正常工作

1 个答案: