递归Boost.Spirit解析很麻烦

时间:2019-09-11 10:10:22

标签: c++ boost boost-spirit

我正在尝试为学校项目的C语言子集建模解析器。但是,我似乎陷入了为Boost.Spirit生成递归解析规则的过程中,因为我的规则要么溢出堆栈,要么根本不拾取任何东西。

例如,我想对以下语法建模:

a :: = ... | | A [a] | a1 op a2 | ...

此表达规则还有其他语法子集,但这些子集可以正常工作。例如,如果我要解析A [3 * 4],则应将其理解为递归解析,其中A [...](语法中的A [a])是数组访问器,3 * 4(a1 op)语法中的a2)就是索引。

我尝试在语法结构中定义以下规则对象:

qi::rule<Iterator, Type(), Skipper> expr_arr;
qi::rule<Iterator, Type(), Skipper> expr_binary_arith;
qi::rule<Iterator, Type(), Skipper> expr_a;

并给他们以下语法:

expr_arr %= qi::lexeme[identifier >> qi::omit['[']] >> expr_a >> qi::lexeme[qi::omit[']']];
expr_binary_arith %= expr_a >> op_binary_arith >> expr_a;
expr_a %= (expr_binary_arith | expr_arr);

其中“ op_binary_arith”是带有允许的运算符的qi :: symbol <>对象。

这可以正常编译,但是在执行时会进入一个无限循环,并且堆栈会溢出。我尝试着看Sehe在以下问题中的答案:How to set max recursion in boost spirit

但是,我没有设置最大递归深度。首先,在几乎所有尝试中,我都没有使它编译时没有错误,但是在最后一次尝试中,尽管结果非常出乎意料,但还是成功构建了。

有人可以指导我正确地执行该语法的正确方向吗?

1 个答案:

答案 0 :(得分:0)

PEG语法不能很好地处理左递归。通常,您必须拆分助手规则才能编写,而无需左递归。

在您的特定情况下,目标产生

a ::= ... | A[a] | a1 op a2 | ...

似乎有点不对劲。这将允许foo[bar]foo + bar,但不允许foo + bar[qux]

通常,在数组元素引用还是纯标识符之间的选择处于较低的优先级(通常是“简单表达式”)。

这里有一个细微的阐述:

literal           = number_literal | string_literal; // TODO exapnd?

expr_arr          = identifier >> '[' >> (expr_a % ',') >> ']';
simple_expression = literal | expr_arr | identifier;
expr_binary_arith = simple_expression >> op_binary_arith >> expr_a;
expr_a            = expr_binary_arith | simple_expression;

现在您可以解析例如:

for (std::string const& input : {
        "A[3*4]",
        "A[F[3]]",
        "A[8 + F[0x31]]",
        "3 * \"foo\"",
    })
{
    std::cout << std::quoted(input) << " -> ";

    It f=begin(input), l=end(input);
    AST::Expr e;

    if (parse(f,l,g,e)) {
        std::cout << "Parsed: " << e << "\n";
    } else {
        std::cout << "Failed\n";
    }

    if (f!=l) {
        std::cout << "Remaining: " << std::quoted(std::string(f,l)) << "\b";
    }
}

打印 Live On Coliru

"A[3*4]" -> Parsed: A[3*4]
"A[F[3]]" -> Parsed: A[F[3]]
"A[8 + F[0x31]]" -> Parsed: A[8+F[49]]
"3 * \"foo\"" -> Parsed: 3*"foo"
  

注意,我现在暂时不考虑效率和操作员优先。

     

在其他答案中将详细讨论这些:

           

还有更多

完整的演示清单

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <experimental/iterator>
namespace qi = boost::spirit::qi;

namespace AST {
    using Var = std::string;

    struct String : std::string {
        using std::string::string;
    };

    using Literal = boost::variant<String, intmax_t, double>;

    enum class ArithOp {
        addition, subtraction, division, multplication
    };

    struct IndexExpr;
    struct BinOpExpr;

    using Expr = boost::variant<
        Literal,
        Var,
        boost::recursive_wrapper<IndexExpr>,
        boost::recursive_wrapper<BinOpExpr>
    >;

    struct IndexExpr {
        Expr expr;
        std::vector<Expr> indices;
    };

    struct BinOpExpr {
        Expr lhs, rhs;
        ArithOp op;
    };

    std::ostream& operator<<(std::ostream& os, Literal const& lit) {
        struct {
            std::ostream& os;
            void operator()(String const& s) const { os << std::quoted(s); }
            void operator()(double d) const { os << d; }
            void operator()(intmax_t i) const { os << i; }
        } vis {os};
        boost::apply_visitor(vis, lit);
        return os;
    }
    std::ostream& operator<<(std::ostream& os, ArithOp const& op) {
        switch(op) {
            case ArithOp::addition: return os << '+';
            case ArithOp::subtraction: return os << '-';
            case ArithOp::division: return os << '/';
            case ArithOp::multplication: return os << '*';
        }
        return os << '?';
    }
    std::ostream& operator<<(std::ostream& os, BinOpExpr const& e) {
        return os << e.lhs << e.op << e.rhs;
    }
    std::ostream& operator<<(std::ostream& os, IndexExpr const& e) {
        std::copy(
            begin(e.indices),
            end(e.indices),
            std::experimental::make_ostream_joiner(os << e.expr << '[', ","));

        return os << ']';
    }
}

BOOST_FUSION_ADAPT_STRUCT(AST::IndexExpr, expr, indices)
BOOST_FUSION_ADAPT_STRUCT(AST::BinOpExpr, lhs, op, rhs)

template <typename Iterator, typename Skipper = qi::space_type>
struct G : qi::grammar<Iterator, AST::Expr()> {
    G() : G::base_type(start) {
        using namespace qi;

        identifier        = alpha >> *alnum;

        number_literal    =
            qi::real_parser<double, qi::strict_real_policies<double> >{}
          | "0x" >> qi::uint_parser<intmax_t, 16> {}
          |         qi::int_parser<intmax_t, 10> {}
          ;

        string_literal    = '"' >> *('\\' >> char_escape | ~char_('"')) >> '"';

        literal           = number_literal | string_literal; // TODO exapnd?

        expr_arr          = identifier >> '[' >> (expr_a % ',') >> ']';
        simple_expression = literal | expr_arr | identifier;
        expr_binary_arith = simple_expression >> op_binary_arith >> expr_a;
        expr_a            = expr_binary_arith | simple_expression;

        start = skip(space) [expr_a];

        BOOST_SPIRIT_DEBUG_NODES(
                (start)
                (expr_a)(expr_binary_arith)(simple_expression)(expr_a)
                (literal)(number_literal)(string_literal)
                (identifier))
    }

  private:
    struct escape_sym : qi::symbols<char, char> {
        escape_sym() {
            this->add
                ("b", '\b')
                ("f", '\f')
                ("r", '\r')
                ("n", '\n')
                ("t", '\t')
                ("\\", '\\')
                ;
        }
    } char_escape;

    struct op_binary_arith_sym : qi::symbols<char, AST::ArithOp> {
        op_binary_arith_sym() {
            this->add
                ("+", AST::ArithOp::addition)
                ("-", AST::ArithOp::subtraction)
                ("/", AST::ArithOp::division)
                ("*", AST::ArithOp::multplication)
                ;
        }
    } op_binary_arith;

    qi::rule<Iterator, AST::Expr()> start;
    qi::rule<Iterator, AST::IndexExpr(), Skipper> expr_arr;
    qi::rule<Iterator, AST::BinOpExpr(), Skipper> expr_binary_arith;
    qi::rule<Iterator, AST::Expr(), Skipper> simple_expression, expr_a;
    // implicit lexemes
    qi::rule<Iterator, AST::Literal()> literal, string_literal, number_literal;
    qi::rule<Iterator, AST::Var()> identifier;
};

int main() {
    using It = std::string::const_iterator;
    G<It> const g;
    for (std::string const& input : {
            "A[3*4]",
            "A[F[3]]",
            "A[8 + F[0x31]]",
            "3 * \"foo\"",
        })
    {
        std::cout << std::quoted(input) << " -> ";

        It f=begin(input), l=end(input);
        AST::Expr e;

        if (parse(f,l,g,e)) {
            std::cout << "Parsed: " << e << "\n";
        } else {
            std::cout << "Failed\n";
        }

        if (f!=l) {
            std::cout << "Remaining: " << std::quoted(std::string(f,l)) << "\b";
        }
    }
}