如何用转义的空格分割句子?

时间:2015-04-01 00:33:26

标签: c++ boost split whitespace delimiter

除了转义的空格之外,我想使用空格作为分隔符来分割我的句子。使用boost :: split和regex,我该如何拆分它?如果不可能,怎么回事?

示例:

std::string sentence = "My dog Fluffy\\ Cake likes to jump";

结果:


蓬松\蛋糕
喜欢


1 个答案:

答案 0 :(得分:3)

三个实施:

  1. 以提升精神
  2. 使用Boost Regex
  3. 手写解析器
  4. 以提升精神

    以下是我用Boost Spirit做到这一点的方法。这可能看起来有些过分,但经验告诉我,一旦你拆分输入文本,你可能需要更多的解析逻辑。

    当你从“只是分裂标记”扩展到具有生产规则的真实语法时,

    Boost Spirit会闪耀。

    <强> Live On Coliru

    #include <boost/spirit/include/qi.hpp>
    namespace qi = boost::spirit::qi;
    
    int main() {
        std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
        using It = std::string::const_iterator;
        It f = sentence.begin(), l = sentence.end();
    
        std::vector<std::string> words;
    
        bool ok = qi::phrase_parse(f, l,
                *qi::lexeme [ +('\\' >> qi::char_ | qi::graph) ], // words
                qi::space - "\\ ", // skipper
                words);
    
        if (ok) {
            std::cout << "Parsed:\n";
            for (auto& w : words)
                std::cout << "\t'" << w << "'\n";
        } else {
            std::cout << "Parse failed\n";
        }
    
        if (f != l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
    

    使用Boost Regex

    这看起来非常简洁,但

    <强> Live On Coliru

    #include <iostream>
    #include <boost/regex.hpp>
    #include <boost/algorithm/string_regex.hpp>
    #include <vector>
    
    int main() {
        std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
    
        std::vector<std::string> words;
        boost::algorithm::split_regex(words, sentence, boost::regex("(?<!\\\\)\\s"), boost::match_default);
    
        for (auto& w : words)
            std::cout << " '" << w << "'\n";
    }
    
      

    使用c ++ 11原始文字,你可以稍微不那么模糊地编写正则表达式:boost::regex(R"((?<!\\)\s)"),意思是“任何空格不跟反斜杠”

    手写解析器

    这有点单调乏味,但像Spirit语法完全是通用的,并且允许良好的性能。

    然而,一旦你开始为你的语法增加复杂性,它就不会像精神方法那样优雅地扩展。优点是编译代码的时间少于Spirit版本。

    <强> Live On Coliru

    #include <iostream>
    #include <iterator>
    #include <vector>
    
    template <typename It, typename Out>
    Out tokens(It f, It l, Out out) {
        std::string accum;
        auto flush = [&] { 
            if (!accum.empty()) {
                *out++ = accum;
                accum.resize(0);
            }
        };
    
        while (f!=l) {
            switch(*f) {
                case '\\': 
                    if (++f!=l && *f==' ')
                        accum += ' ';
                    else
                        accum += '\\';
                    break;
                case ' ': case '\t': case '\r': case '\n':
                    ++f;
                    flush();
                    break;
                default:
                    accum += *f++;
            }
        }
        flush();
        return out;
    }
    
    int main() {
        std::string const sentence = "My dog Fluffy\\ Cake likes to jump";
    
        std::vector<std::string> words;
    
        tokens(sentence.begin(), sentence.end(), back_inserter(words));
    
        for (auto& w : words)
            std::cout << "\t'" << w << "'\n";
    }