我正在尝试使用Spirit来解析Module1.Module2.value
形式的表达式(任意数量的以点分隔的大写标识符,然后是点,然后是小写的OCaml样式标识符)。我目前对解析器的定义如下:
using namespace boost::spirit::qi;
template <typename Iter=std::string::iterator>
struct value_path : grammar<Iter, boost::tuple<std::vector<std::string>, std::string>()> {
value_path() :
value_path::base_type(start)
{
start = -(module_path<Iter>() >> '.') >> value_name<Iter>();
}
rule<Iter, boost::tuple<std::vector<std::string>, std::string>()> start;
};
其中module_path
和value_name
是类似的模板结构,从qi::grammar
开始,单个start
字段被分配了一些Spirit规则,可能使用其他自定义语法(例如{{ 1}}取决于构造函数中类似定义的value_name
和lowercase_ident
。
当试图用这个语法operator_name
时,程序会在Spirit内部的某个地方发生段错误(根据gdb)。等价定义,其中parse_phrase()
的构造函数如下(我基本上展开了它所依赖的所有自定义语法,只留下内置的Spirit解析器,并试图使其可读,事后看来这是一个傻瓜的差事) :
value_path
不会出现段错误,并且似乎工作正常,但我宁愿在代码中避免使用这些冗长且不可读的内容。它根本不可扩展。
到目前为止,我已经尝试了start =
-((raw[upper >> *(alnum | char_('_') | char_('\''))] % '.') >> '.')
>> lexeme[((lower | char_('_')) >> *(alnum | char_('_') | char_('\'')))
| char_('(') >>
( ( (char_('!') >> *char_("-+!$%&*./:<=>?@^|~")
| (char_("~?") >> +char_("-+!$%&*./:<=>?@^|~"))
| ( (char_("-+=<>@^|&*/$%") >> *char_("-+!$%&*./:<=>?@^|~"))
| string("mod")
| string("lor")
| string("lsl")
| string("lsr")
| string("asr")
| string("or")
| string("-.")
| string("!=")
| string("||")
| string("&&")
| string(":=")
| char_("*+=<>&-")
)
) >> char_(')')
)
)
];
的各种组合,以及将依赖关系链中的.alias()
,value_name<Iter>()
和所有中间语法保存到自己的字段中。这些都没有奏效。如何保持第一个示例的高级抽象?是否有一种标准的方法可以在Spirit中编写语法而不会遇到问题?
答案 0 :(得分:3)
您遇到了麻烦,因为表达式模板会保留对临时工具的内部引用。
只需聚合子解析器实例:
template <typename Iter=std::string::iterator>
struct value_path : grammar<Iter, boost::tuple<std::vector<std::string>, std::string>()> {
value_path() : value_path::base_type(start)
{
start = -(module_path_ >> '.') >> value_name_;
}
private:
rule<Iter, boost::tuple<std::vector<std::string>, std::string>()> start;
module_path<Iter> module_path_;
value_name<Iter> value_name_;
};
注意我觉得为这些小物品使用单独的子语法可能是一种设计气味。虽然语法分解通常是一个好主意,可以保持构建时间的可管理性和代码大小有所降低,但似乎 - 从这里的描述 - 你可能会过度。
&#34;抹灰&#34; qi::rule
后面的解析器表达式(有效类型擦除)带来了可能很大的运行时开销。如果您随后为多个迭代器类型实例化那些,那么您可能会因为二进制文件的不必要增长而将其复杂化。
更新关于在Spirit中撰写语法的惯用方法,我的观点如下:
<强> Live On Coliru 强>
using namespace ascii;
using qi::raw;
lowercase_ident = raw[ (lower | '_') >> *(alnum | '_' | '\'') ];
module_path_item = raw[ upper >> *(alnum | '_' | '\'') ];
module_path_ = module_path_item % '.';
auto special_char = boost::proto::deep_copy(char_("-+!$%&*./:<=>?@^|~"));
operator_name = qi::raw [
('!' >> *special_char) /* branch 1 */
| (char_("~?") >> +special_char) /* branch 2 */
| (!char_(".:") >> special_char >> *special_char) /* branch 3 */
| "mod" /* branch 4 */
| "lor" | "lsl" | "lsr" | "asr" | "or" /* branch 5-9 */
| "-." /* branch 10 doesn't match because of branch 3 */
| "!=" | "||" | "&&" | ":=" /* branch 11-14 doesn't match because of branch 1,3 */
// | (special_char - char_("!$%./:?@^|~")) /* "*+=<>&-" cannot match because of branch 3 */
]
;
value_name_ =
lowercase_ident
| '(' >> operator_name >> ')'
;
start = -(module_path_ >> '.') >> value_name_;
规则是声明为:
的字段qi::rule<Iter, ast::value_path(), Skipper> start;
qi::rule<Iter, ast::module_path(), Skipper> module_path_;
// lexeme: (no skipper)
qi::rule<Iter, std::string()> value_name_, module_path_item, lowercase_ident, operator_name;
注意:
value_path
语法没有使用一个,所以你传递给qi::phrase_parse
的任何队长都被忽略了< / LI>
qi::lexeme[]
qi::raw
逐字复制解析后的文本。这使我们能够更简洁地编写语法(使用'!'
代替char_('!')
,"mod"
代替qi::string("mod")
)。请注意,裸文字被隐式转换为&#34;非捕获&#34;在Qi解析器表达式的上下文中有qi::lit(...)
个节点,但由于我们无论如何使用了raw[]
,lit
没有捕获属性的事实不是问题。我认为这会产生一个完美的 cromulent 语法定义,该定义应满足您对&#34;高级&#34;的标准。语法本身就有一些重要性(不管它的表达式是什么样的解析器生成器语言):
operator_name
规则,这将导致与简化的平面替代列表相同的效果special_chars
在替代分支3 中,例如,我已经注意到带有否定断言的异常:
(!char_(".:") >> special_char >> *special_char) /* branch 3 */
!char_(".:")
断言说:输入不匹配'.'
或':'
继续匹配(任何特殊字符序列)。事实上,你可以将其写成:
((special_char - '.' - ':') >> *special_char) /* branch 3 */
或者甚至,当我写完它时:
(!char_(".:") >> +special_char) /* branch 3 */
分支的简化实际上提高了抽象层次!现在很明显,一些分支永远不会匹配,因为早期的分支按照定义匹配输入:
| "-." /* branch 10 doesn't match because of branch 3 */
| "!=" | "||" | "&&" | ":=" /* branch 11-14 doesn't match because of branch 1,3 */
// | (special_char - char_("!$%./:?@^|~")) /* "*+=<>&-" cannot match because of branch 3 */
我希望你能明白为什么我将这部分语法限定为&#34;有点重要&#34; :)我现在假设当你把它简化为单一规则时你会感到困惑或出现问题(你的傻瓜差事&#34;)。
需要注意的一些进一步改进:
boost::tuple<>
,以使代码更清晰using namespace
。这通常是个坏主意。而使用Spirit通常是一个非常糟糕的想法(它可能导致无法解决的歧义,或者很难发现错误)。正如您所看到的,它并不一定会导致非常详细的代码。#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;
namespace ast {
using module_path = std::vector<std::string>;
struct value_path {
module_path module;
std::string value_expr;
};
}
BOOST_FUSION_ADAPT_STRUCT(ast::value_path, (ast::module_path, module)(std::string,value_expr))
template <typename Iter, typename Skipper = ascii::space_type>
struct value_path : qi::grammar<Iter, ast::value_path(), Skipper> {
value_path() : value_path::base_type(start)
{
using namespace ascii;
using qi::raw;
lowercase_ident = raw[ (lower | '_') >> *(alnum | '_' | '\'') ];
module_path_item = raw[ upper >> *(alnum | '_' | '\'') ];
module_path_ = module_path_item % '.';
auto special_char = boost::proto::deep_copy(char_("-+!$%&*./:<=>?@^|~"));
operator_name = qi::raw [
('!' >> *special_char) /* branch 1 */
| (char_("~?") >> +special_char) /* branch 2 */
| (!char_(".:") >> +special_char) /* branch 3 */
| "mod" /* branch 4 */
| "lor" | "lsl" | "lsr" | "asr" | "or" /* branch 5-9 */
| "-." /* branch 10 doesn't match because of branch 3 */
| "!=" | "||" | "&&" | ":=" /* branch 11-14 doesn't match because of branch 1,3 */
// | (special_char - char_("!$%./:?@^|~")) /* "*+=<>&-" cannot match because of branch 3 */
]
;
value_name_ =
lowercase_ident
| '(' >> operator_name >> ')'
;
start = -(module_path_ >> '.') >> value_name_;
BOOST_SPIRIT_DEBUG_NODES((start)(module_path_)(value_name_)(module_path_item)(lowercase_ident)(operator_name))
}
private:
qi::rule<Iter, ast::value_path(), Skipper> start;
qi::rule<Iter, ast::module_path(), Skipper> module_path_;
// lexeme: (no skipper)
qi::rule<Iter, std::string()> value_name_, module_path_item, lowercase_ident, operator_name;
};
int main()
{
for (std::string const input : {
"Some.Module.Package.ident",
"ident",
"A.B.C_.mod", // as lowercase_ident
"A.B.C_.(mod)", // as operator_name (branch 4)
"A.B.C_.(!=)", // as operator_name (branch 1)
"(!)" // as operator_name (branch 1)
})
{
std::cout << "--------------------------------------------------------------\n";
std::cout << "Parsing '" << input << "'\n";
using It = std::string::const_iterator;
It f(input.begin()), l(input.end());
value_path<It> g;
ast::value_path data;
bool ok = qi::phrase_parse(f, l, g, ascii::space, data);
if (ok) {
std::cout << "Parse succeeded\n";
} else {
std::cout << "Parse failed\n";
}
if (f!=l)
std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
}
}
--------------------------------------------------------------
Parsing 'Some.Module.Package.ident'
<start>
<try>Some.Module.Package.</try>
<module_path_>
<try>Some.Module.Package.</try>
<module_path_item>
<try>Some.Module.Package.</try>
<success>.Module.Package.iden</success>
<attributes>[[S, o, m, e]]</attributes>
</module_path_item>
<module_path_item>
<try>Module.Package.ident</try>
<success>.Package.ident</success>
<attributes>[[M, o, d, u, l, e]]</attributes>
</module_path_item>
<module_path_item>
<try>Package.ident</try>
<success>.ident</success>
<attributes>[[P, a, c, k, a, g, e]]</attributes>
</module_path_item>
<module_path_item>
<try>ident</try>
<fail/>
</module_path_item>
<success>.ident</success>
<attributes>[[[S, o, m, e], [M, o, d, u, l, e], [P, a, c, k, a, g, e]]]</attributes>
</module_path_>
<value_name_>
<try>ident</try>
<lowercase_ident>
<try>ident</try>
<success></success>
<attributes>[[i, d, e, n, t]]</attributes>
</lowercase_ident>
<success></success>
<attributes>[[i, d, e, n, t]]</attributes>
</value_name_>
<success></success>
<attributes>[[[[S, o, m, e], [M, o, d, u, l, e], [P, a, c, k, a, g, e]], [i, d, e, n, t]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'ident'
<start>
<try>ident</try>
<module_path_>
<try>ident</try>
<module_path_item>
<try>ident</try>
<fail/>
</module_path_item>
<fail/>
</module_path_>
<value_name_>
<try>ident</try>
<lowercase_ident>
<try>ident</try>
<success></success>
<attributes>[[i, d, e, n, t]]</attributes>
</lowercase_ident>
<success></success>
<attributes>[[i, d, e, n, t]]</attributes>
</value_name_>
<success></success>
<attributes>[[[], [i, d, e, n, t]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.mod'
<start>
<try>A.B.C_.mod</try>
<module_path_>
<try>A.B.C_.mod</try>
<module_path_item>
<try>A.B.C_.mod</try>
<success>.B.C_.mod</success>
<attributes>[[A]]</attributes>
</module_path_item>
<module_path_item>
<try>B.C_.mod</try>
<success>.C_.mod</success>
<attributes>[[B]]</attributes>
</module_path_item>
<module_path_item>
<try>C_.mod</try>
<success>.mod</success>
<attributes>[[C, _]]</attributes>
</module_path_item>
<module_path_item>
<try>mod</try>
<fail/>
</module_path_item>
<success>.mod</success>
<attributes>[[[A], [B], [C, _]]]</attributes>
</module_path_>
<value_name_>
<try>mod</try>
<lowercase_ident>
<try>mod</try>
<success></success>
<attributes>[[m, o, d]]</attributes>
</lowercase_ident>
<success></success>
<attributes>[[m, o, d]]</attributes>
</value_name_>
<success></success>
<attributes>[[[[A], [B], [C, _]], [m, o, d]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.(mod)'
<start>
<try>A.B.C_.(mod)</try>
<module_path_>
<try>A.B.C_.(mod)</try>
<module_path_item>
<try>A.B.C_.(mod)</try>
<success>.B.C_.(mod)</success>
<attributes>[[A]]</attributes>
</module_path_item>
<module_path_item>
<try>B.C_.(mod)</try>
<success>.C_.(mod)</success>
<attributes>[[B]]</attributes>
</module_path_item>
<module_path_item>
<try>C_.(mod)</try>
<success>.(mod)</success>
<attributes>[[C, _]]</attributes>
</module_path_item>
<module_path_item>
<try>(mod)</try>
<fail/>
</module_path_item>
<success>.(mod)</success>
<attributes>[[[A], [B], [C, _]]]</attributes>
</module_path_>
<value_name_>
<try>(mod)</try>
<lowercase_ident>
<try>(mod)</try>
<fail/>
</lowercase_ident>
<operator_name>
<try>mod)</try>
<success>)</success>
<attributes>[[m, o, d]]</attributes>
</operator_name>
<success></success>
<attributes>[[m, o, d]]</attributes>
</value_name_>
<success></success>
<attributes>[[[[A], [B], [C, _]], [m, o, d]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing 'A.B.C_.(!=)'
<start>
<try>A.B.C_.(!=)</try>
<module_path_>
<try>A.B.C_.(!=)</try>
<module_path_item>
<try>A.B.C_.(!=)</try>
<success>.B.C_.(!=)</success>
<attributes>[[A]]</attributes>
</module_path_item>
<module_path_item>
<try>B.C_.(!=)</try>
<success>.C_.(!=)</success>
<attributes>[[B]]</attributes>
</module_path_item>
<module_path_item>
<try>C_.(!=)</try>
<success>.(!=)</success>
<attributes>[[C, _]]</attributes>
</module_path_item>
<module_path_item>
<try>(!=)</try>
<fail/>
</module_path_item>
<success>.(!=)</success>
<attributes>[[[A], [B], [C, _]]]</attributes>
</module_path_>
<value_name_>
<try>(!=)</try>
<lowercase_ident>
<try>(!=)</try>
<fail/>
</lowercase_ident>
<operator_name>
<try>!=)</try>
<success>)</success>
<attributes>[[!, =]]</attributes>
</operator_name>
<success></success>
<attributes>[[!, =]]</attributes>
</value_name_>
<success></success>
<attributes>[[[[A], [B], [C, _]], [!, =]]]</attributes>
</start>
Parse succeeded
--------------------------------------------------------------
Parsing '(!)'
<start>
<try>(!)</try>
<module_path_>
<try>(!)</try>
<module_path_item>
<try>(!)</try>
<fail/>
</module_path_item>
<fail/>
</module_path_>
<value_name_>
<try>(!)</try>
<lowercase_ident>
<try>(!)</try>
<fail/>
</lowercase_ident>
<operator_name>
<try>!)</try>
<success>)</success>
<attributes>[[!]]</attributes>
</operator_name>
<success></success>
<attributes>[[!]]</attributes>
</value_name_>
<success></success>
<attributes>[[[], [!]]]</attributes>
</start>
Parse succeeded