Question

我正在尝试从字符串中检索数字。像_0_1_这样的字符串格式，我希望获得0和1。

这是我的代码：

std::tr1::regex rx("_(\\d+)_");
tstring fileName = Utils::extractFileName(docList[i]->c_str());                 
std::tr1::smatch res;
std::tr1::regex_search(fileName, res, rx);

但结果我有（更新：这是调试器手表的奇怪输出）：

res[0] = 3
res[1] = 1

3来自哪里以及我做错了什么？

更新：我将结果输出到屏幕：

for (std::tr1::smatch::iterator it = res.begin(); it < res.end(); ++it){
    std::cout << *it << std::endl;
}

并且程序输出：

_0_
0

Answer 1

正则表达式通常会返回所有非重叠匹配，因此如果您在数字的前面和后面都添加_，那么您将无法获得所有数字，因为在之后下划线第一个数字也不能用作匹配之前第二个数字

_123_456_ ^ This cannot be used twice

只需使用(\\d+)作为表达式来获取所有数字（默认情况下regexp为“贪婪”，因此无论如何都会找到所有可用的数字）。

Answer 2

这似乎是预期的输出。第一个匹配应该是匹配的整个子字符串，然后第二个匹配（依此类推）应该是捕获组。

如果您想要完成所有比赛，则需要多次致电regex_search以获得每场比赛：

auto it = fileName.cbegin();
while (std::tr1::regex_search(it, fileName.cend(), res, rx)) {
    std::cout << "Found matching group:" << std::endl;
    for (int mm = 1; mm < res.size(); ++mm) {
        std::cout << std::string(res[mm].first, res[mm].second) << std::endl;
    }

    it = res[0].second; // start 1 past the end
}

如果您确实只需要在下划线中“包裹”数字，则可以使用肯定断言(?=_)来确保发生这种情况：

// positive assertions are required matches, but are not consumed by the
// matching group.
std::tr1::regex rx("_(\\d+)(?=_)");

当针对"//abc_1_2_3.txt"运行时，检索1和2，但不检索3。

Answer 3

解决方案：感谢所有人，在regex_token_iterator和(\\d+)的帮助下重写。现在它起作用了：

std::regex_token_iterator<tstring::iterator> rend;
tstring fileName = Utils::extractFileName(docList[i]->c_str());                   
std::tr1::regex_search(fileName, res, rx);              
for (std::regex_token_iterator<std::string::iterator> it(fileName.begin(), fileName.end(), rx); it != rend; ++it) {
        std::cout << " [" << *it << "]";
}

C ++ 11 VS12 regex_search

3 个答案: