精神解析器段错误

时间:2015-09-26 19:28:42

标签: c++ parsing segmentation-fault grammar boost-spirit

我跑这个时遇到了段错误。它看起来像调试 打印,但是当我调试它时,我得到一个无限循环的回溯。 如果有人能帮助我指出正确的方向,我会很感激。 如果可能的话,我也很感激任何清理这个的提示/技巧 语法。

谢谢!

//code here:
/***
*I-EBNF parser
*
*This defines a grammar for BNF.
*/

//Speeds up compilation times.
//This is a relatively small grammar, this is useful.
#define BOOST_SPIRIT_NO_PREDEFINED_TERMINALS
#define BOOST_SPIRIT_QI_DEBUG

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/fusion/adapted.hpp>
#include <boost/fusion/support.hpp>
#include <vector>
#include <string>
#include <iostream>

namespace Parser
{

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

enum class RHSType
{
    Terminal, Identifier
};
struct RHS
{
    RHSType type;
    std::string value;
};
struct Rule
{
    std::string identifier; //lhs
    std::vector<RHS> rhs;
};
}

//expose our structs to fusion:
BOOST_FUSION_ADAPT_STRUCT(
    Parser::RHS,
    (Parser::RHSType, type)
    (std::string, value)
)
BOOST_FUSION_ADAPT_STRUCT(
    Parser::Rule,
    (std::string, identifier)
    (std::vector<Parser::RHS>, rhs)
)

namespace Parser
{
typedef std::vector<Rule> RuleList;

//our grammar definition
template <typename Iterator>
struct Grammar: qi::grammar<Iterator, std::list<Rule>, ascii::space_type>
{
    Grammar(): Grammar::base_type(rules)
    {
        qi::char_type char_;

        letter = char_("a-zA-Z");
        digit = char_('0', '9');
        symbol = char_('[') | ']' | '[' | ']' | '(' | ')' | '<' | '>'
| '\'' | '\"' | '=' | '|' | '.' | ',' | ';';
        character = letter | digit | symbol | '_';
        identifier = letter >> *(letter | digit | '_');
        terminal = (char_('\'') >> character >> *character >>
char_('\'')) | (char_('\"') >> character >> *character >> char_('\"'));
        lhs = identifier;
        rhs = terminal | identifier | char_('[') >> rhs >> char_(']')
| char_('{') >> rhs >> char_('}') | char_('(') >> rhs >> char_(')') |
rhs >> char_('|') >> rhs | rhs >> char_(',') >> rhs;
        rule = identifier >> char_('=') >> rhs;
        rules = rule >> *rule;
    }

private:
    qi::rule<Iterator, char(), ascii::space_type> letter, digit,
symbol, character;
    qi::rule<Iterator, std::string(), ascii::space_type> identifier,
lhs, terminal;
    qi::rule<Iterator, RHS, ascii::space_type> rhs;
    qi::rule<Iterator, Rule, ascii::space_type> rule;
    qi::rule<Iterator, std::list<Rule>, ascii::space_type> rules;
};

}

int main()
{
    Parser::Grammar<std::string::const_iterator> parser;
    boost::spirit::ascii::space_type space;
    std::string input;
    std::vector<std::string> output;
    bool result;

    while (std::getline(std::cin, input))
        {
            if (input.empty())
                {
                    break;
                }
            std::string::const_iterator it, itEnd;
            it = input.begin();
            itEnd = input.end();
            result = phrase_parse(it, itEnd, parser, space, output);
            if (result && it == itEnd)
                {
                    std::cout << "success" << std::endl;
                }
        }

    return 0;
}

¹从[精神 - 通用]邮件列表中发帖:http://boost.2283326.n4.nabble.com/parser-segfault-tips-tricks-td4680336.html

1 个答案:

答案 0 :(得分:2)

2015年9月26日上午01:45,Littlefield,Tyler写道:

  

大家好:   我跑这个时遇到了段错误。它看起来像调试   打印,但是当我调试它时,我得到一个无限循环的回溯。   如果有人能帮助我指出正确的方向,我会很感激。   如果可能的话,我也很感激任何清理这个的提示/技巧   语法。

首先,它不会编译。

它不应该编译,因为语法不公开属性(你的意思是list<Rule>()而不是list<Rule>?)。

但是你永远不能将它分配给output变量(std::vector<std::string>?)?!)

同样,你忘记了括号

qi::rule<Iterator, RHS(), ascii::space_type> rhs;
qi::rule<Iterator, Rule(), ascii::space_type> rule;
qi::rule<Iterator, std::list<Rule>(), ascii::space_type> rules;

rhs规则具有无限左递归:

    rhs               = terminal
                      | identifier
                      | ('[' >> rhs >> ']')
                      | ('{' >> rhs >> '}')
                      | ('(' >> rhs >> ')')
                      | (rhs >> '|' >> rhs)  // OOPS
                      | (rhs >> ',' >> rhs)  // OOPS
                      ;

这可能解释了崩溃,因为它会导致堆栈溢出。

备注

直播录制(part #1part #2)完全显示了我首先清理语法所采取的步骤,然后使事情真正具有编译价值。

那里有很多工作:

  • 清理:使用隐式qi::lit作为互动([]{}()=|,{{1} })
  • 使用kleene +代替,(不止一次)
  • 更喜欢a >> *a解析器来解析...列表
  • 我不得不在“RHS”周围“摆动”规则;最后两个分支中有无限递归(参见%)。我通过引入一个“纯”表达式规则(仅解析一个// OOPS结构)来修复它。我已将此类型重命名为RHS

    “list”解析(识别由Expression,分隔的表达式列表)将移至原始|规则中,我将其重命名为rhs更具描述性:

    expr_list
  • 为了使合成属性实际转换为expression = qi::attr(Parser::ExprType::Terminal) >> terminal | qi::attr(Parser::ExprType::Identifier) >> identifier | qi::attr(Parser::ExprType::Compound) >> qi::raw [ '[' >> expr_list >> ']' ] | qi::attr(Parser::ExprType::Compound) >> qi::raw [ '{' >> expr_list >> '}' ] | qi::attr(Parser::ExprType::Compound) >> qi::raw [ '(' >> expr_list >> ')' ] ; expr_list = expression % (char_("|,")) // TODO FIXME? ; (现在:RHS)类型,我们需要实际公开Expression(现在:{{1第一个适应成员的值。您可以在上面的行中看到我们为此目的使用了RHSType

  • 现代编译器和boost版本可以大大简化ExprType调用:

    qi::attr()
  • 我将某些规则“升级”为 lexemes ,这意味着他们不服从船长。

      

    猜测我应该将空格字符添加到BOOST_FUSION_ADAPT_STRUCT字符串文字)中的可接受字符集中。如果那不是你想要的,只需删除BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value) BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list) 中的最后一个字符。

  • 我还将船长更改为terminal,因为它不会跳过换行符。您可以使用它来直接使用相同的语法轻松解析多行输入。做,例如:

    char_("[][]()<>\'\"=|.,;_ ");
  

另请参阅: Boost spirit skipper issues了解有关船长,词汇及其互动的其他信息。

不用多说,这是一个有效的例子:

<强> Live On Coliru

blank_type

打印输出:

rules = rule % qi::eol;

启用调试(#define BOOST_SPIRIT_DEBUG #include <boost/fusion/adapted.hpp> #include <boost/spirit/include/phoenix.hpp> #include <boost/spirit/include/qi.hpp> #include <iostream> #include <string> #include <vector> namespace Parser { namespace qi = boost::spirit::qi; namespace ascii = boost::spirit::ascii; enum class ExprType { Terminal, Identifier, Compound }; static inline std::ostream& operator<<(std::ostream& os, ExprType type) { switch (type) { case ExprType::Terminal: return os << "Terminal"; case ExprType::Identifier: return os << "Identifier"; case ExprType::Compound: return os << "Compound"; } return os << "(unknown)"; } struct Expression { // TODO make recursive (see `boost::make_recursive_variant`) ExprType type; std::string value; }; using ExprList = std::vector<Expression>; struct Rule { std::string identifier; // lhs ExprList expr_list; }; } //expose our structs to fusion: BOOST_FUSION_ADAPT_STRUCT(Parser::Expression, type, value) BOOST_FUSION_ADAPT_STRUCT(Parser::Rule, identifier, expr_list) namespace Parser { typedef std::list<Rule> RuleList; //our grammar definition template <typename Iterator> struct Grammar: qi::grammar<Iterator, RuleList(), ascii::blank_type> { Grammar(): Grammar::base_type(rules) { qi::char_type char_; symbol = char_("[][]()<>\'\"=|.,;_ "); character = qi::alpha | qi::digit | symbol; identifier = qi::alpha >> *(qi::alnum | char_('_')); // TODO capture strings including interpunction(?) terminal = ('\'' >> +(character - '\'') >> '\'') | ('\"' >> +(character - '\"') >> '\"'); expression = qi::attr(Parser::ExprType::Terminal) >> terminal | qi::attr(Parser::ExprType::Identifier) >> identifier | qi::attr(Parser::ExprType::Compound) >> qi::raw [ '[' >> expr_list >> ']' ] | qi::attr(Parser::ExprType::Compound) >> qi::raw [ '{' >> expr_list >> '}' ] | qi::attr(Parser::ExprType::Compound) >> qi::raw [ '(' >> expr_list >> ')' ] ; expr_list = expression % (char_("|,")) // TODO FIXME? ; // above accepts mixed separators: // a, b, c | d, e // // original accepted: // // a, b, [ c | d ], e // a| b| [ c , d ]| e // a| b| [ c | d ]| e // a, b, [ c , d ], e rule = identifier >> '=' >> expr_list; //rules = rule % qi::eol; // alternatively, parse multi-line input in one go rules = +rule; BOOST_SPIRIT_DEBUG_NODES((rules)(rule)(expr_list)(expression)(identifier)(terminal)) } private: qi::rule<Iterator, Expression(), ascii::blank_type> expression; qi::rule<Iterator, ExprList(), ascii::blank_type> expr_list; qi::rule<Iterator, Rule(), ascii::blank_type> rule; qi::rule<Iterator, RuleList(), ascii::blank_type> rules; // lexemes: qi::rule<Iterator, std::string()> terminal, identifier; qi::rule<Iterator, char()> symbol, character; }; } int main() { using It = std::string::const_iterator; Parser::Grammar<It> parser; boost::spirit::ascii::blank_type blank; std::string input; while (std::getline(std::cin, input)) { if (input.empty()) { break; } It it = input.begin(), itEnd = input.end(); Parser::RuleList output; bool result = phrase_parse(it, itEnd, parser, blank, output); if (result) { std::cout << "success\n"; for (auto& rule : output) { std::cout << "\ntarget: " << rule.identifier << "\n"; for (auto& rhs : rule.expr_list) { std::cout << "rhs: " << boost::fusion::as_vector(rhs) << "\n"; } } } else { std::cout << "parse failed\n"; } if (it != itEnd) std::cout << "remaining unparsed: '" << std::string(it, itEnd) << "\n"; } } ):

success

target: assigned1
rhs:    (Identifier some_var)
rhs:    (Terminal a 'string' value)
rhs:    (Compound [ 'okay', "this", is_another, identifier, { ( "nested" ) } ])
rhs:    (Terminal done)
success

target: assigned2
rhs:    (Compound { a })

失去结局

代码中还有一些TODO。实施它们需要更多努力,但我不确定它实际上是正确的方向,所以我等待反馈:)

TODO的一个基本原则是表示AST中表达式的递归性质。现在,我通过插入嵌套复合表达式的源字符串来“删除”。