使用boost spirit解析带有可选分隔符的字符串

时间:2014-01-23 07:19:08

标签: c++ boost boost-spirit boost-spirit-qi

我正在尝试使用特殊规则解析URL查询字符串。到目前为止,它适用于下面描述的一个排除 使用以下内容将URL解析为键值对集合:

const qi::rule<std::string::const_iterator, std::string()> key = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9/%\\-_~\\.");
const qi::rule<std::string::const_iterator, std::string()> value = *(qi::char_ - '=' - '&');
const qi::rule<std::string::const_iterator, std::pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *(('&') >> pair);
到目前为止,这么好。其中一个特殊情况是,&符号可以以XML实体的形式呈现 - &amp;所以查询规则已升级到

const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *((qi::lit("&amp;")|'&') >> pair);

它按预期工作。然后出现了另外的特殊情况 - 引用的值可以包含未转义的等号和符号,形式为a = b&amp; d = e&amp; f = $$ g = h&amp; i = j $$&amp; x = y&amp; z =高清 哪个应解析成

  • a =&gt; B'/ LI>
  • d =&gt; ë
  • f =&gt; G = H&安培;我= j的
  • x =&gt; ÿ
  • x =&gt; DEF

所以我为“引用的”值添加了额外的规则

const qi::rule<std::string::const_iterator, std::string()> key   =  qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9/%\\-_~\\.");
const qi::rule<std::string::const_iterator, std::string()> escapedValue = qi::omit["$$"] >> *(qi::char_ - '$') >> qi::omit["$$"];
const qi::rule<std::string::const_iterator, std::string()> value = *(escapedValue | (qi::char_ - '=' - '&'));
const qi::rule<std::string::const_iterator, std::pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *((qi::lit("&amp;")|'&') >> pair);

,再次按预期工作直到下一个案例 - a = b&amp; d = e&amp; f = $$ g = h&amp; i = j $$ x = y&amp; z = def,注意,没有&符号在关闭“$$”和下一个关键名称之间。通过添加像

这样的kleene运算符,可以很容易地解决它
const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *(__*__(qi::lit("&amp;")|'&') >> pair);

但由于某种原因,它没有做到这一点。任何建议将不胜感激!

编辑: 示例代码

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <unordered_map>

namespace rulez
{
    using namespace boost::spirit::qi;
    using It = std::string::const_iterator;

    const rule<It, std::string()> key                                    = boost::spirit::qi::char_("a-zA-Z_") >> *boost::spirit::qi::char_("a-zA-Z_0-9/%\\-_~\\.");
    const rule<It, std::string()> escapedValue                           = boost::spirit::qi::omit["$$"] >> *(boost::spirit::qi::char_ - '$') >> boost::spirit::qi::omit["$$"];
    const rule<It, std::string()> value                                  = *(escapedValue | (boost::spirit::qi::char_ - '=' - '&'));
    const rule<It, std::pair<std::string, std::string>()> pair           = key >> -('=' >> value);
    const rule<It, std::unordered_map<std::string, std::string>()> query = pair >> *(*(boost::spirit::qi::lit("&amp;")|'&') >> pair);
}

int main()
{
    using namespace std;
    unordered_map<string, string> keyVal;
  //string const paramString = "a=b&d=e&f=$$g=h&i=j$$&x=y&z=def";
    string const paramString = "a=b&d=e&f=$$g=h&i=j$$x=y&z=def";

    boost::spirit::qi::parse(paramString.begin(), paramString.end(), rulez::query, keyVal);

    for (const auto& pair : keyVal)
        cout << "(\"" << pair.first << "\",\"" << pair.second << "\")" << endl;
}

“a = b&amp; d = e&amp; f = $$ g = h&amp; i = j $$ x = y&amp; z = def”的输出(错误,应与“a = b&amp; d”相同= E&安培; F = $$ G = H&安培;我= j的$$&安培; X = Y&安培; Z = DEF“)

  

(“a”,“b”),(“d”,“e”),(“f”,“g = h&amp; i = jx”)

输出“a = b&amp; d = e&amp; f = $$ g = h&amp; i = j $$&amp; x = y&amp; z = def”(正如预期的那样)

  

(“a”,“b”),(“d”,“e”),(“f”,“g = h&amp; i = j”),(“x”,“y”),( “z”,“def”)

编辑: 一些更简单的解析规则,只是为了让事情更容易理解

namespace rulez
{
    const rule<std::string::const_iterator, std::string()> key =  +(char_ - '&' - '=');
    const rule<std::string::const_iterator, std::string()> escapedValue = omit["$$"] >> *(char_ - '$') >> omit["$$"];
    const rule<std::string::const_iterator, std::string()> value = *(escapedValue | (char_ - '&' - '='));
    const rule<std::string::const_iterator, pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
    const rule<std::string::const_iterator, unordered_map<std::string, std::string>()> query =  pair >> *(*(lit('&')) >> pair);
}

2 个答案:

答案 0 :(得分:1)

我猜您的问题是value规则

value = *(escapedValue | (char_ - '&' - '='));

解析时... $$ g = h&amp; i = j $$ x = ...

$$g=h&i=j$$x=
^---------^

它将标记的字符串$$g=h&i=j$$解析为escapedValue,然后kleene运算符(*)允许(char_ - '&' - '=')规则的第二部分value解析{{1} }}

x

并且仅在$$g=h&i=j$$x= ^ 规则停止。

也许这样的事情会有所帮助:

=

答案 1 :(得分:0)

这解决了这个问题。但是,我决定放弃使用精灵来解析查询字符串的想法 - 每个特殊情况都会使查询越来越麻烦,过了一段时间没人会记住为什么查询是按原样写的:)

qi::rule<std::string::const_iterator, std::string()> key =  +(qi::char_ - '=' - '&');
qi::rule<std::string::const_iterator, std::string()> escapedValue = qi::omit["$$"] >> *(qi::char_ - "$$") >> qi::omit["$$"];
qi::rule<std::string::const_iterator, std::string()> nonEscapedValue = !qi::lit("$$") >> *(qi::char_ - '=' - '&');

auto sep = qi::lit("&amp;") | '&';
qi::rule<std::string::const_iterator, std::pair<std::string, boost::optional<std::string>>()> keyValue = 
        key >> -('=' >> nonEscapedValue) >> (sep | qi::eoi);
qi::rule<std::string::const_iterator, std::pair<std::string, boost::optional<std::string>>()> escapedKeyValue =  
        key >> '=' >> escapedValue >> -(sep);
auto query = *(qi::hold[keyValue] | escapedKeyValue);