如何使用boost :: spirit :: qi与std :: vector <token_type>而不是std :: string

时间:2015-08-23 07:04:13

标签: c++ c++11 boost boost-spirit-qi boost-spirit-lex

在一个应用程序中,我基本上想要一个&#34;预解析&#34;在Qi解析器可以看到之前调整令牌流的阶段。

这样做的一种方法是使用某种&#34; lexer适配器&#34;它由lexer构成,本身是lexer,它包装并修改内部lexer的行为。然而,调试起来会更简单,更简单,我只需先将内部lexer的整个输入流调用,然后将结果存储在std::vector<token_type>中,然后根据需要进行修改,然后将结果传递给解析器。 (在我的应用程序中,我并不认为这会引起任何性能问题。)

在几年前的电子邮件交流中,有人形容这个问题,哈特穆特说这应该是微不足道的。 http://comments.gmane.org/gmane.comp.parsers.spirit.general/24899

但是,我没有找到任何代码示例或说明如何执行此操作,请查看spirit::lex中的标题并弄清楚。这可能会占据我很长一段时间,除非你,亲爱的读者,可以提供帮助。

具体问题是,如何制作一个&#34; shim&#34; lexer包裹了一对std::vector<token_type>::iterator并且spirit::qi看起来就像标准的spirit::lex lexer一样。

修改:要明确,这不是此问题的重复:Using Boost.Spirit.Qi with custom lexer 我的token_type被归因,哈特穆特说我需要做的额外事情的细节就是这个问题的实质内容。

编辑:好的,我做了一个SSCCE。这个版本已经将lexer令牌归因于此,但即使没有它,我仍然无法让它工作,这似乎是一个很好的SSCCE开始。

亮点:

&#34;令牌缓冲区&#34;类型:

template<typename TokenType>
struct token_buffer {
    std::vector<TokenType> tokens_;

    token_buffer() = default;

    bool operator()(token_type t) {
        tokens_.push_back(t);
        return true;
    }

    void print(std::ostream & o) const { ... }
};

我的第一次尝试制作一个&#34;缓冲区词法分析器&#34;看起来像是lex :: lexer to Qi,但实际上是从缓冲区提供令牌。这个来自上面的lex_basic,我不确定这是否正确。

template<typename LexerType>
class buffer_lexer : public lex_basic<LexerType> {
public:
    typedef std::vector<token_type> buff_type;
    typedef typename buff_type::const_iterator iterator_type;

private:
    const buff_type & buff_;

public:
    buffer_lexer(const buff_type & b) : lex_basic<LexerType>(), buff_(b) {}

    iterator_type begin() const { return buff_.begin(); }
    iterator_type end() const { return buff_.end(); }

    // for consistency with regular lexer `begin` signature, not sure if this is needed
    template<typename T>
    iterator_type begin(T, T) { return begin(); }
};

我的第二次尝试制作缓冲区词法分析器。这个派生自lex_basic,而是尝试按照标题boost/spirit/home/lex/lexer/lexertl/lexer.hpp中的这些说明进行操作:

///////////////////////////////////////////////////////////////////////////
//
//  Every lexer type to be used as a lexer for Spirit has to conform to
//  the following public interface:
//
//    typedefs:
//        iterator_type   The type of the iterator exposed by this lexer.
//        token_type      The type of the tokens returned from the exposed
//                        iterators.
//
//    functions:
//        default constructor
//                        Since lexers are instantiated as base classes
//                        only it might be a good idea to make this
//                        constructor protected.
//        begin, end      Return a pair of iterators, when dereferenced
//                        returning the sequence of tokens recognized in
//                        the input stream given as the parameters to the
//                        begin() function.
//        add_token       Should add the definition of a token to be
//                        recognized by this lexer.
//        clear           Should delete all current token definitions
//                        associated with the given state of this lexer
//                        object.
//
//    template parameters:
//        Iterator        The type of the iterator used to access the
//                        underlying character stream.
//        Token           The type of the tokens to be returned from the
//                        exposed token iterator.
//        Functor         The type of the InputPolicy to use to instantiate
//                        the multi_pass iterator type to be used as the
//                        token iterator (returned from begin()/end()).
//
///////////////////////////////////////////////////////////////////////////

这里是&#34; buffer_lexer_raw&#34;我提出了:

template<typename Iterator,
     typename TokenType,
     typename Functor = lex::lexertl::functor<TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
    typedef TokenType token_type;
    typedef std::vector<token_type> buff_type;
    typedef typename buff_type::const_iterator iterator_type;

    typedef typename boost::detail::iterator_traits<typename token_type::iterator_type>::value_type char_type;

private:
    buff_type buff_;

public:
    buffer_lexer_raw() {}

    void set_buffer(const buff_type & b) { buff_ = b; }

    iterator_type begin() const { return buff_.begin(); }
    iterator_type end() const { return buff_.end(); }

    // for consistency with regular lexer `begin` signature, not sure if this is needed
    template<typename T>
    iterator_type begin(T, T) { return begin(); }

    std::size_t add_token(char_type const* state, char_type tokendef,
            std::size_t token_id, char_type const* targetstate)
    {
        return 1;
    }

    void clear(char_type const* state) {}
};

测试代码响应文件顶部定义的宏。

// Use the type "buffer_lexer" which derives from lex_basic<Lexer>
//#define WHICH_LEXER_TYPE 1
// Use the type "buffer_lexer_raw" which does not derive from anything
//#define WHICH_LEXER_TYPE 2
// Use the "placebo" lexer, which is just lex_basic<Lexer>, as a sanity test of our lex:: api calls
#define WHICH_LEXER_TYPE 0

测试代码将:

  • 在一个简单的测试用例上运行词法分析器并对lexed标记序列进行详细转储。
  • 使用lex::tokenize_and_parse在几个简单的测试用例中串联运行词法分析器和语法,并转储生成的AST。
  • 再次尝试lexing和解析,使用宏选择的词法分析器生成用于qi::parse的迭代器。它将检查生成的AST是否与生成的AST相同&#34; easy&#34;方式。

目前,使用gcc-4.8和clang-3.6编译#define WHICH_LEXER_TYPE 0选项对我来说很有用。

我实际上无法使用#define WHICH_LEXER_TYPE 1#define WHICH_LEXER_TYPE 2选项进行编译。对于类型1,clang给出以下错误消息,我不会有最模糊的想法:

In file included from main.cpp:1:
In file included from /usr/include/boost/spirit/include/lex_lexertl.hpp:16:
In file included from /usr/include/boost/spirit/home/lex/lexer_lexertl.hpp:15:
In file included from /usr/include/boost/spirit/home/lex.hpp:13:
In file included from /usr/include/boost/spirit/home/lex/lexer.hpp:14:
In file included from /usr/include/boost/spirit/home/lex/lexer/token_def.hpp:21:
In file included from /usr/include/boost/spirit/home/lex/reference.hpp:16:
/usr/include/boost/spirit/home/qi/reference.hpp:43:30: error: no matching member function for call to 'parse'
            return ref.get().parse(first, last, context, skipper, attr);
                   ~~~~~~~~~~^~~~~
/usr/include/boost/spirit/home/qi/parse.hpp:86:42: note: in instantiation of function template specialization 'boost::spirit::qi::reference<const
      boost::spirit::qi::rule<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const
      char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
      __gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > >, ast::Body (),
      boost::spirit::locals<std::basic_string<char>, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
      boost::spirit::unused_type, boost::spirit::unused_type> >::parse<__gnu_cxx::__normal_iterator<const
      boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
      mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
      boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
      std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >, boost::spirit::context<boost::fusion::cons<ast::Body &, boost::fusion::nil>,
      boost::spirit::locals<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> >, boost::spirit::unused_type,
      ast::Body>' requested here
        return compile<qi::domain>(expr).parse(first, last, context, unused, attr);
                                         ^
main.cpp:414:12: note: in instantiation of function template specialization 'boost::spirit::qi::parse<__gnu_cxx::__normal_iterator<const
      boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
      mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
      boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
      std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >,
      basic_grammar<boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const
      char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
      __gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > > >, ast::Body>' requested here
                if (!qi::parse(it, fin, bgram, tree2)) {
                         ^
/usr/include/boost/spirit/home/qi/nonterminal/rule.hpp:273:14: note: candidate function [with Context = boost::spirit::context<boost::fusion::cons<ast::Body &,
      boost::fusion::nil>, boost::spirit::locals<mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na> >, Skipper =
      boost::spirit::unused_type, Attribute = ast::Body] not viable: no known conversion from '__gnu_cxx::__normal_iterator<const
      boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>,
      mpl_::bool_<true>, unsigned long> *, std::vector<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >,
      boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>,
      std::allocator<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, boost::mpl::vector<char, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long> > > >' to
      'boost::spirit::lex::lexertl::iterator<boost::spirit::lex::lexertl::functor<boost::spirit::lex::lexertl::token<__gnu_cxx::__normal_iterator<const char *,
      std::basic_string<char> >, boost::mpl::vector<char, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na,
      mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na, mpl_::na>, mpl_::bool_<true>, unsigned long>, lexertl::detail::data,
      __gnu_cxx::__normal_iterator<const char *, std::basic_string<char> >, mpl_::bool_<false>, mpl_::bool_<true> > > &' for 1st argument
        bool parse(Iterator& first, Iterator const& last
             ^
/usr/include/boost/spirit/home/qi/nonterminal/rule.hpp:319:14: note: candidate function template not viable: requires 6 arguments, but 5 were provided
        bool parse(Iterator& first, Iterator const& last
             ^
1 error generated.

&#34; 2&#34;选项提供基本相同的错误消息。 gcc似乎没有给出更好的错误信息。

这是完整的源代码:

#include <boost/spirit/include/lex_lexertl.hpp>
#include <boost/spirit/include/qi.hpp>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/std_pair.hpp>
#include <boost/variant/get.hpp>
#include <boost/variant/variant.hpp>
#include <boost/variant/recursive_variant.hpp>
#include <boost/preprocessor/stringize.hpp>

#include <vector>
#include <string>

typedef unsigned int uint;

namespace lex = boost::spirit::lex;
namespace qi = boost::spirit::qi;
namespace mpl = boost::mpl;

// Use the type "buffer_lexer" which derives from lex_basic<Lexer>
//#define WHICH_LEXER_TYPE 1
// Use the type "buffer_lexer_raw" which does not derive from anything
//#define WHICH_LEXER_TYPE 2
// Use the "placebo" lexer, which is just lex_basic<Lexer>, as a sanity test of
// our lex:: api calls
#define WHICH_LEXER_TYPE 0

//// Lexer definition

enum tokenids {
  LCARET = lex::min_token_id + 10,
  RCARET,
  BSLASH,
  LBRACE,
  RBRACE,
  LPAREN,
  RPAREN,
  EQUALS,
  USCORE,
  ALPHA,
  NUM,
  EOL,
  BLANK,
  IDANY
};

#define TOKEN_CASE(X)                                                          \
  case X: return #X

const char *token_id_string(size_t id) {
  switch (id) {
    TOKEN_CASE(LCARET);
    TOKEN_CASE(RCARET);
    TOKEN_CASE(BSLASH);
    TOKEN_CASE(LBRACE);
    TOKEN_CASE(RBRACE);
    TOKEN_CASE(LPAREN);
    TOKEN_CASE(RPAREN);
    TOKEN_CASE(EQUALS);
    TOKEN_CASE(USCORE);
    TOKEN_CASE(ALPHA);
    TOKEN_CASE(NUM);
    TOKEN_CASE(EOL);
    TOKEN_CASE(BLANK);
    TOKEN_CASE(IDANY);
  default:
    return "Unknown token";
  }
}

template <typename Lexer> struct lex_basic : lex::lexer<Lexer> {
  lex_basic() {
    this->self.add
        ('<', LCARET)
        ('>', RCARET)
        ('/', BSLASH)
        ('{', LBRACE)
        ('}', RBRACE)
        ('(', LPAREN)
        (')', RPAREN)
        ('=', EQUALS)
        ('_', USCORE)
        ("[A-Za-z]", ALPHA)
        ("[0-9]", NUM)
        ('\n', EOL)
        ("[ \\t\\r]", BLANK)
        (".", IDANY);
  }
};

typedef std::string::const_iterator str_it;
// the token type needs to know the iterator type of the underlying
// input and the set of used token value types
typedef lex::lexertl::token<str_it, mpl::vector<char>> token_type;

template <typename TokenType> struct token_buffer {
  std::vector<TokenType> tokens_;

  token_buffer() = default;

  bool operator()(token_type t) {
    tokens_.push_back(t);
    return true;
  }

  void print(std::ostream &o) const {
    std::cout << "tokens_.size() == " << tokens_.size() << std::endl;
    for (size_t i = 0; i < tokens_.size(); ++i) {
      const TokenType &t = tokens_[i];

      o << "[" << i << "]: -" << token_id_string(t.id()) << "- \"" << t
        << "\" [";

      const auto &v = t.value();
      if (t.id() == EOL) {
        o << "\\n";
      } else {
        o << v;
      }
      o << "]" << std::endl;
    }
  }
};

/***
 * Lexers which serve tokens from a buffer
 */

// Two versions of the same thing, one deriving from lex::lexer, one not
template <typename LexerType> class buffer_lexer : public lex_basic<LexerType> {
public:
  typedef std::vector<token_type> buff_type;
  typedef typename buff_type::const_iterator iterator_type;

private:
  const buff_type &buff_;

public:
  buffer_lexer(const buff_type &b) : lex_basic<LexerType>(), buff_(b) {}

  iterator_type begin() const { return buff_.begin(); }
  iterator_type end() const { return buff_.end(); }

  // for consistency with regular lexer `begin` signature, not sure if this is
  // needed
  template <typename T> iterator_type begin(T, T) { return begin(); }
};

template <typename Iterator, typename TokenType,
          typename Functor = lex::lexertl::functor<
          TokenType, lex::lexertl::detail::data, Iterator>>
class buffer_lexer_raw {
  typedef TokenType token_type;
  typedef std::vector<token_type> buff_type;
  typedef typename buff_type::const_iterator iterator_type;

  typedef typename boost::detail::iterator_traits<
      typename token_type::iterator_type>::value_type char_type;

private:
  buff_type buff_;

public:
  buffer_lexer_raw() {}

  void set_buffer(const buff_type &b) { buff_ = b; }

  iterator_type begin() const { return buff_.begin(); }
  iterator_type end() const { return buff_.end(); }

  // for consistency with regular lexer `begin` signature, not sure if this is
  // needed
  template <typename T> iterator_type begin(T, T) { return begin(); }

  std::size_t add_token(char_type const *state, char_type tokendef,
        std::size_t token_id, char_type const *targetstate) {
    return 1;
  }

  void clear(char_type const *state) {}
};

/***
 * AST
 */

namespace ast {
typedef std::string Str;

struct BraceExpr;

typedef boost::variant<Str, boost::recursive_wrapper<BraceExpr>> BraceExprArg;

struct BraceExpr {
  std::vector<BraceExprArg> args;
};

typedef std::pair<Str, Str> Pair;

struct Body;

typedef boost::variant<Pair, BraceExpr, boost::recursive_wrapper<Body>> Node;

struct Body {
  Str key;
  std::vector<Node> nodes;
};
} // end namespace ast

BOOST_FUSION_ADAPT_STRUCT(ast::BraceExpr,
          (std::vector<ast::BraceExprArg>, args))
BOOST_FUSION_ADAPT_STRUCT(ast::Body,
          (ast::Str, key)(std::vector<ast::Node>, nodes))

namespace ast {
// Stream ops
class printer : public boost::static_visitor<> {
  std::ostream &ss_;
  uint indent_;
  std::string indent(uint extra = 0) const {
    return std::string(indent_ + extra, ' ');
  }
  std::string indent_plus_tab() const { return indent(tab_width); }

public:
  static constexpr uint tab_width = 4;

  explicit printer(std::ostream &s, uint indent = 0)
      : ss_(s), indent_(indent) {}

  void operator()(const Str &s) const { ss_ << s; }
  void operator()(const BraceExpr &b) const {
    ss_ << "{";
    for (size_t i = 0; i < b.args.size(); ++i) {
      if (i) {
        ss_ << " ";
      }
      boost::apply_visitor(*this, b.args[i]);
    }
    ss_ << "}";
  }
  void operator()(const Pair &p) const { ss_ << p.first << " = " << p.second; }

  void operator()(const Body &b) const {
    ss_ << indent() << "<" << b.key << ">\n";
    printer p{ss_, indent_ + tab_width};
    for (const auto &n : b.nodes) {
      ss_ << indent_plus_tab();
      boost::apply_visitor(p, n);
      ss_ << "\n";
    }
    ss_ << indent() << "</" << b.key << ">";
  }
};

std::ostream &operator<<(std::ostream &ss, const BraceExpr &b) {
  printer p{ss};
  p(b);
  return ss;
}

std::ostream &operator<<(std::ostream &ss, const Pair &p) {
  printer pr{ss};
  pr(p);
  return ss;
}

std::ostream &operator<<(std::ostream &ss, const Body &b) {
  printer p{ss};
  p(b);
  return ss;
}

// Equality ops
bool operator==(const Pair &p1, const Pair &p2) {
  return p1.first == p2.first && p1.second == p2.second;
}
bool operator==(const BraceExpr &b1, const BraceExpr &b2) {
  return b1.args == b2.args;
}
bool operator==(const Body &b1, const Body &b2) {
  return b1.key == b2.key && b1.nodes == b2.nodes;
}
bool operator!=(const Pair &p1, const Pair &p2) { return !(p1 == p2); }
bool operator!=(const BraceExpr &b1, const BraceExpr &b2) {
  return !(b1 == b2);
}
bool operator!=(const Body &b1, const Body &b2) { return !(b1 == b2); }
} // end namespace ast

/***
 * Grammar
 */

template <typename Iterator>
struct basic_grammar
    : qi::grammar<Iterator, ast::Body(), qi::locals<ast::Str>> {
  qi::rule<Iterator, ast::Body(), qi::locals<ast::Str>> body;
  qi::rule<Iterator, ast::Node()> node;
  qi::rule<Iterator, ast::Pair()> pair;
  qi::rule<Iterator, ast::BraceExprArg()> brace_expr_arg;
  qi::rule<Iterator, ast::BraceExpr()> brace_expr;
  qi::rule<Iterator, ast::Str()> identifier;
  qi::rule<Iterator, ast::Str()> str;
  qi::rule<Iterator, ast::Str()> open_tag;
  qi::rule<Iterator /*, ast::Str()*/> close_tag;
  qi::rule<Iterator> lbrace;
  qi::rule<Iterator> rbrace;
  qi::rule<Iterator> equals;

  qi::rule<Iterator> ws;

  template <typename TokenDef>
  basic_grammar(const TokenDef &tok)
      : basic_grammar::base_type(body, "body") {
    using namespace qi;

    ws %= token(BLANK) | token(EOL);
    lbrace %= token(LBRACE);
    rbrace %= token(RBRACE);
    equals %= token(EQUALS);
    identifier %= token(ALPHA) >> *(token(ALPHA) | token(NUM) | token(USCORE));
    str %= *(token(LCARET) | token(RCARET) | token(BSLASH) | token(LPAREN) |
         token(RPAREN) | token(ALPHA) | token(NUM) | token(USCORE) |
         token(EQUALS) | token(BLANK) | token(IDANY));
    open_tag %= omit[token(LCARET)] >> identifier >>
        omit[token(RCARET)]; // tok.open_tag;
    close_tag %= omit[token(LCARET) >> token(BSLASH)] >> identifier >>
         omit[token(RCARET)]; // tok.close_tag;

    pair = skip(boost::proto::deep_copy(ws))[identifier >> equals >> str];

    body = skip(boost::proto::deep_copy(ws))[open_tag >> *node >> close_tag];
    node = brace_expr | body | pair;

    brace_expr_arg = brace_expr | identifier;
    brace_expr =
        skip(boost::proto::deep_copy(ws))[lbrace >> *brace_expr_arg >> rbrace];
  }
};

/***
 * Usage / Tests
 */

// use actor_lexer<> here if your token definitions have semantic
// actions
typedef lex::lexertl::lexer<token_type> lexer_type;

// this is the iterator exposed by the lexer, we use this for parsing
typedef lexer_type::iterator_type iterator_type;

token_buffer<token_type> test_lexer(const std::string &input,
        bool silent = false) {
  str_it s = input.begin();
  str_it end = input.end();

  // create a lexer instance
  lex_basic<lexer_type> lex;

  token_buffer<token_type> buff;
  if (!lex::tokenize(s, end, lex, [&](token_type t) { return buff(t); })) {
    if (!silent) {
      std::cout << "\nTokenizing failed!" << std::endl;
    }
  } else {
    if (!silent) {
      std::cout << "\nTokenizing succeeded!" << std::endl;
    }
  }

  if (!silent) {
    buff.print(std::cout);
  }
  return buff;
}

void test_grammar(const std::string &input) {
  lex_basic<lexer_type> lex;
  basic_grammar<iterator_type> gram{lex};
  ast::Body tree;

  {
    str_it s = input.begin();
    str_it end = input.end();

    if (!lex::tokenize_and_parse(s, end, lex, gram, tree)) {
      std::cout << "\nParsing failed!" << std::endl;
    } else {
      std::cout << "\nParsing succeeded!" << std::endl;
    }

    std::cout << tree << std::endl;
  }

  // Now try to do it in two steps, with buffered lexer
  auto buff = test_lexer(input, true); // get buffer, silence output

#if WHICH_LEXER_TYPE == 1
  buffer_lexer<lexer_type> blex{buff.tokens_};
#else
#if WHICH_LEXER_TYPE == 2
  buffer_lexer_raw<str_it, token_type> blex;
  blex.set_buffer(buff.tokens_);
#else
  lex_basic<lexer_type> blex;
#endif
#endif

  basic_grammar<iterator_type> bgram{blex};
  ast::Body tree2;

  {
#if (WHICH_LEXER_TYPE == 1) || (WHICH_LEXER_TYPE == 2)
    auto it = blex.begin();
#else
    str_it s = input.begin();
    str_it end = input.end();
    auto it = blex.begin(s, end);
#endif

    auto fin = blex.end();

    if (!qi::parse(it, fin, bgram, tree2)) {
      std::cout << "\nBuffered parsing failed!" << std::endl;
    } else {
      std::cout << "\nBuffered parsing succeeded!" << std::endl;
    }
  }

  std::cout << tree2 << std::endl;

  if (tree != tree2) {
    std::cout << "\nRegular parsing vs. buffered parsing mismatch!"
          << std::endl;
  }
}

int main() {
  std::string input{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"</asdf>\n"};

  test_lexer(input);

  // Use lexer and grammar at once as demonstrated in tutorials

  std::string input2 = "<asdf></asdf>";
  test_grammar(input2);

  test_grammar(input);

  std::string input3{""
"<asdf>\n"
"foo = bar\n"
"{F foo}\n"
"{G {F foo} {H bar}}\n"
"<jkl>\n"
"baz = gaz\n"
"{H {H H} {{{H} {G} {F foo}} {B ar}} {Q i}}\n"
"</jkl>\n"
"</asdf>\n"};

  test_grammar(input3);

  return 0;
}

0 个答案:

没有答案