Question

我正在尝试使用特殊规则解析URL查询字符串。到目前为止，它适用于下面描述的一个排除使用以下内容将URL解析为键值对集合：

const qi::rule<std::string::const_iterator, std::string()> key = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9/%\\-_~\\.");
const qi::rule<std::string::const_iterator, std::string()> value = *(qi::char_ - '=' - '&');
const qi::rule<std::string::const_iterator, std::pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *(('&') >> pair);

到目前为止，这么好。其中一个特殊情况是，＆符号可以以XML实体的形式呈现 - ＆amp;所以查询规则已升级到

const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *((qi::lit("&amp;")|'&') >> pair);

它按预期工作。然后出现了另外的特殊情况 - 引用的值可以包含未转义的等号和符号，形式为a = b＆amp; d = e＆amp; f = $$ g = h＆amp; i = j $$＆amp; x = y＆amp; z =高清哪个应解析成

a =＆gt; B'/ LI>
d =＆gt; ë
f =＆gt; G = H＆安培;我= j的
x =＆gt; ÿ
x =＆gt; DEF

所以我为“引用的”值添加了额外的规则

const qi::rule<std::string::const_iterator, std::string()> key   =  qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9/%\\-_~\\.");
const qi::rule<std::string::const_iterator, std::string()> escapedValue = qi::omit["$$"] >> *(qi::char_ - '$') >> qi::omit["$$"];
const qi::rule<std::string::const_iterator, std::string()> value = *(escapedValue | (qi::char_ - '=' - '&'));
const qi::rule<std::string::const_iterator, std::pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *((qi::lit("&amp;")|'&') >> pair);

，再次按预期工作直到下一个案例 - a = b＆amp; d = e＆amp; f = $$ g = h＆amp; i = j $$ x = y＆amp; z = def，注意，没有＆符号在关闭“$$”和下一个关键名称之间。通过添加像

这样的kleene运算符，可以很容易地解决它

const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *(__*__(qi::lit("&amp;")|'&') >> pair);

但由于某种原因，它没有做到这一点。任何建议将不胜感激！

编辑：示例代码

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <unordered_map>

namespace rulez
{
    using namespace boost::spirit::qi;
    using It = std::string::const_iterator;

    const rule<It, std::string()> key                                    = boost::spirit::qi::char_("a-zA-Z_") >> *boost::spirit::qi::char_("a-zA-Z_0-9/%\\-_~\\.");
    const rule<It, std::string()> escapedValue                           = boost::spirit::qi::omit["$$"] >> *(boost::spirit::qi::char_ - '$') >> boost::spirit::qi::omit["$$"];
    const rule<It, std::string()> value                                  = *(escapedValue | (boost::spirit::qi::char_ - '=' - '&'));
    const rule<It, std::pair<std::string, std::string>()> pair           = key >> -('=' >> value);
    const rule<It, std::unordered_map<std::string, std::string>()> query = pair >> *(*(boost::spirit::qi::lit("&amp;")|'&') >> pair);
}

int main()
{
    using namespace std;
    unordered_map<string, string> keyVal;
  //string const paramString = "a=b&d=e&f=$$g=h&i=j$$&x=y&z=def";
    string const paramString = "a=b&d=e&f=$$g=h&i=j$$x=y&z=def";

    boost::spirit::qi::parse(paramString.begin(), paramString.end(), rulez::query, keyVal);

    for (const auto& pair : keyVal)
        cout << "(\"" << pair.first << "\",\"" << pair.second << "\")" << endl;
}

“a = b＆amp; d = e＆amp; f = $$ g = h＆amp; i = j $$ x = y＆amp; z = def”的输出（错误，应与“a = b＆amp; d”相同= E＆安培; F = $$ G = H＆安培;我= j的$$＆安培; X = Y＆安培; Z = DEF“）

（“a”，“b”），（“d”，“e”），（“f”，“g = h＆amp; i = jx”）

输出“a = b＆amp; d = e＆amp; f = $$ g = h＆amp; i = j $$＆amp; x = y＆amp; z = def”（正如预期的那样）

（“a”，“b”），（“d”，“e”），（“f”，“g = h＆amp; i = j”），（“x”，“y”），（ “z”，“def”）

编辑：一些更简单的解析规则，只是为了让事情更容易理解

namespace rulez
{
    const rule<std::string::const_iterator, std::string()> key =  +(char_ - '&' - '=');
    const rule<std::string::const_iterator, std::string()> escapedValue = omit["$$"] >> *(char_ - '$') >> omit["$$"];
    const rule<std::string::const_iterator, std::string()> value = *(escapedValue | (char_ - '&' - '='));
    const rule<std::string::const_iterator, pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
    const rule<std::string::const_iterator, unordered_map<std::string, std::string>()> query =  pair >> *(*(lit('&')) >> pair);
}

Answer 1

我猜您的问题是value规则

value = *(escapedValue | (char_ - '&' - '='));

解析时... $$ g = h＆amp; i = j $$ x = ...

$$g=h&i=j$$x=
^---------^

它将标记的字符串$$g=h&i=j$$解析为escapedValue，然后kleene运算符（*）允许(char_ - '&' - '=')规则的第二部分value解析{{1} }}

并且仅在$$g=h&i=j$$x= ^规则停止。

也许这样的事情会有所帮助：

Answer 2

这解决了这个问题。但是，我决定放弃使用精灵来解析查询字符串的想法 - 每个特殊情况都会使查询越来越麻烦，过了一段时间没人会记住为什么查询是按原样写的：）

qi::rule<std::string::const_iterator, std::string()> key =  +(qi::char_ - '=' - '&');
qi::rule<std::string::const_iterator, std::string()> escapedValue = qi::omit["$$"] >> *(qi::char_ - "$$") >> qi::omit["$$"];
qi::rule<std::string::const_iterator, std::string()> nonEscapedValue = !qi::lit("$$") >> *(qi::char_ - '=' - '&');

auto sep = qi::lit("&amp;") | '&';
qi::rule<std::string::const_iterator, std::pair<std::string, boost::optional<std::string>>()> keyValue = 
        key >> -('=' >> nonEscapedValue) >> (sep | qi::eoi);
qi::rule<std::string::const_iterator, std::pair<std::string, boost::optional<std::string>>()> escapedKeyValue =  
        key >> '=' >> escapedValue >> -(sep);
auto query = *(qi::hold[keyValue] | escapedKeyValue);

使用boost spirit解析带有可选分隔符的字符串

2 个答案: