Question

所以我有一个名为split_alpha（）的函数，它接受一个std :: string并将字符串拆分为单词，使用任何非alphaneumeric字符作为分隔符。它还将单词映射到它们的低级版本。

.htaccess

该函数在99％的时间内都能正常工作，但是当我给它字符串“发送查询：”SELECT * FROM users“”（不包括整个字符串周围的引号）时，它会做一些非常奇怪的事情。它基本上进入一个无限循环（在while循环中）并且永远不会找到字符串的结尾。相反，它一直在从某处读取随机字符/字符串？我的向量在最终段错误之前最终大小约为200。有人知道是什么原因引起的吗？我尝试打印出字符串，看起来非常好。再一次，代码适用于我尝试过的每一个字符串。谢谢！

Answer 1

不是while循环吗？

是的，但是你可以在 while 循环检查之前触发几个 ++ it ，并且在任何一种情况下，迭代器可能已经在结束时字符串。很可能你尝试过的其他字符串并没有导致失败，因为它们都以字母数字字符结尾。

反转 ++ it 的顺序和检查：

if (it == to_split.end()) { break; }
++it;

说明：以下断言将失败，因为迭代器将不再指向字符串的末尾（但进一步指向一个字符）：

if (it == to_split.end())
{
    ++it;
    assert(it == to_split.end());
}

Answer 2

由于已经指出了函数中bug的来源，我可以建议使用正则表达式对你的单词拆分略有不同的方法：

#include <iostream>
#include <regex>
#include <vector>
#include <string>
#include <cctype>

std::vector<std::string> split_alpha(std::string str)
{
    std::regex RE{ "([a-zA-Z0-9]+)" }; // isalnum equivalent
    std::vector<std::string> result;

    // find every word
    for (std::smatch matches; std::regex_search(str, matches, RE); str = matches.suffix())
    {
        //push word to the vector
        result.push_back(matches[1].str());

        //transform to lower
        for (char &c : result[result.size() - 1])
            c = std::tolower(c);
    }

    return result;
}

int main()
{
    // test the function
    for (auto &word : split_alpha("Sending query: “SELECT * FROM users”"))
        std::cout << word << std::endl;

    return 0;
}

结果：

sending
query
select
from
users

迭代字符串c ++的错误

2 个答案: