除了转义的空格之外,我想使用空格作为分隔符来分割我的句子。使用boost :: split和regex,我该如何拆分它?如果不可能,怎么回事?
示例:
std::string sentence = "My dog Fluffy\\ Cake likes to jump";
结果:
我
狗
蓬松\蛋糕
喜欢
到
跳
答案 0 :(得分:3)
三个实施:
以下是我用Boost Spirit做到这一点的方法。这可能看起来有些过分,但经验告诉我,一旦你拆分输入文本,你可能需要更多的解析逻辑。
当你从“只是分裂标记”扩展到具有生产规则的真实语法时,Boost Spirit会闪耀。
<强> Live On Coliru 强>
#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;
int main() {
std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
using It = std::string::const_iterator;
It f = sentence.begin(), l = sentence.end();
std::vector<std::string> words;
bool ok = qi::phrase_parse(f, l,
*qi::lexeme [ +('\\' >> qi::char_ | qi::graph) ], // words
qi::space - "\\ ", // skipper
words);
if (ok) {
std::cout << "Parsed:\n";
for (auto& w : words)
std::cout << "\t'" << w << "'\n";
} else {
std::cout << "Parse failed\n";
}
if (f != l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
这看起来非常简洁,但
<强> Live On Coliru 强>
#include <iostream>
#include <boost/regex.hpp>
#include <boost/algorithm/string_regex.hpp>
#include <vector>
int main() {
std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
std::vector<std::string> words;
boost::algorithm::split_regex(words, sentence, boost::regex("(?<!\\\\)\\s"), boost::match_default);
for (auto& w : words)
std::cout << " '" << w << "'\n";
}
使用c ++ 11原始文字,你可以稍微不那么模糊地编写正则表达式:
boost::regex(R"((?<!\\)\s)")
,意思是“任何空格不跟反斜杠”
这有点单调乏味,但像Spirit语法完全是通用的,并且允许良好的性能。
然而,一旦你开始为你的语法增加复杂性,它就不会像精神方法那样优雅地扩展。优点是编译代码的时间少于Spirit版本。
<强> Live On Coliru 强>
#include <iostream>
#include <iterator>
#include <vector>
template <typename It, typename Out>
Out tokens(It f, It l, Out out) {
std::string accum;
auto flush = [&] {
if (!accum.empty()) {
*out++ = accum;
accum.resize(0);
}
};
while (f!=l) {
switch(*f) {
case '\\':
if (++f!=l && *f==' ')
accum += ' ';
else
accum += '\\';
break;
case ' ': case '\t': case '\r': case '\n':
++f;
flush();
break;
default:
accum += *f++;
}
}
flush();
return out;
}
int main() {
std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
std::vector<std::string> words;
tokens(sentence.begin(), sentence.end(), back_inserter(words));
for (auto& w : words)
std::cout << "\t'" << w << "'\n";
}