如何在boost精神中实现四种反向抛光符号解析器?

时间:2014-04-22 18:01:53

标签: boost boost-spirit-qi

我试图为旧的基于语言的语法实现解析器,其中大多数函数采用以下形式: " num" " NUM" "命令" 其中command是某种字符串。

例如:

    0 1 HSFF
    41 SENSOR ON
    1 12.0 BH 4 LNON

正如您所看到的,语法[主要]是反向抛光表示法,在命令之前有一些参数字符串。语法是伪白空间依赖的,其中:

    0 1 HSFF 41 SENSOR ON

同样有效:

    0 1 HSFF
    41 SENSOR ON

(换句话说,' \ n'被视为空格)

还会跳过额外的空格,所以:

    0           1 HSFF              41 SENSOR      ON

是2个有效命令,带有大量不必要的空格。

所有这些看起来都很简单,所以我开始匆匆忙忙地实现语法。当然,事情从来没有像它们看起来那么简单,我发现我的解析器在第一个字符上失败了(在本例中是一个int)。所以,把事情搞得一团糟,我试着实施一条规则:

    namespace qi = boost::spirit::qi;
    namespace ascii = boost::spirit::ascii;

    qi::rule<Iterator> Cmd_TARGETSENSPAIRCMD = 
        qi::int_ >> (lit("TARGET") | lit("SENSOR") | lit("PAIR") ) 
        >> (lit("ON") | lit("OFF") | lit("ERASE") );

    std::string in("0 TARGET ERASE\n");

    Iterator = in.begin();
    bool success = qi::parse(in.begin(), in.end(), Cmd_TARGETSENSPAIRCMD, ascii::space);

此代码块始终返回false,表示解析失败。

正如您所看到的,规则是int必须后跟两个文字,在这种情况下,指示命令是针对由int标识的目标,传感器还是对,要打开,关闭,或删除。

如果我查看迭代器以查看解析停止的位置,它会显示它在int上立即失败。所以我将规则改为简单地是 + qi :: int _ ,它成功地解析了int,但是在文字上失败了。将规则缩短为简单 qi :: int_&gt;&gt;点亮(&#34; TARGET&#34;)也失败了。

我认为问题可能出在我使用的空白队长身上,但我无法确定我做错了什么。

有没有办法告诉大家所有令牌都被空格分隔,除了引用的字符串(在我的语法中变成标签)?

1 个答案:

答案 0 :(得分:3)

我为你做了一点幻想。

我通常采取的第一步是提出AST模型:

namespace Ast
{
    enum Command  { NO_CMD, TARGET, SENSOR, PAIR };
    enum Modifier { NO_MODIFIER, ON, OFF, ERASE };

    struct ModifiedCommand
    {
        Command cmd  = NO_CMD;
        Modifier mod = NO_MODIFIER;
    };

    struct OtherCommand
    {
        std::string token;
        OtherCommand(std::string token = "") : token(std::move(token))
        { }
    };

    typedef boost::variant<int, double> Operand;

    typedef boost::variant<Operand, ModifiedCommand, OtherCommand> RpnMachineInstruction;
    typedef std::vector<RpnMachineInstruction> RpnMachineProgram;
}

正如你所看到的,我打算区分整数和双倍的操作数值,我会对待任何&#34;其他&#34;命令(例如&#34; HSSF&#34;)在你的语法中没有被主动描述为自由形式的标记(大写字母)。

现在,我们将规则定义映射到此:

RpnGrammar() : RpnGrammar::base_type(_start)
{
    _start         = *_instruction;
    _instruction   = _operand | _mod_command | _other_command;

    _operand       = _strict_double | qi::int_;

    _mod_command   = _command >> _modifier;
    _other_command = qi::as_string [ +qi::char_("A-Z") ];

    // helpers
    _command.add("TARGET", Ast::TARGET)("SENSOR", Ast::SENSOR)("PAIR", Ast::PAIR);
    _modifier.add("ON", Ast::ON)("OFF", Ast::OFF)("ERASE", Ast::ERASE);
}

语法将结果解析为指令列表(Ast::RpnMachineProgram),其中每条指令都是操作数或操作(带修饰符的命令,或任何其他自由格式命令,如&#34; HSSF& #34)。以下是规则声明:

qi::rule<It, Ast::RpnMachineProgram(),     Skipper> _start;
qi::rule<It, Ast::RpnMachineInstruction(), Skipper> _instruction;
qi::rule<It, Ast::ModifiedCommand(),       Skipper> _mod_command;
qi::rule<It, Ast::Operand(),               Skipper> _operand;

// note: omitting the Skipper has the same effect as wrapping with `qi::lexeme`
qi::rule<It, Ast::OtherCommand()> _other_command;

qi::real_parser<double, boost::spirit::qi::strict_real_policies<double> > _strict_double;
qi::symbols<char, Ast::Command>  _command;
qi::symbols<char, Ast::Modifier> _modifier;

您可以看到它解析了问题中的示例:

Parse succeeded, 10 stack instructions
int:0 int:1 'HSFF'
int:41 SENSOR [ON] 
int:1 double:12 'BH'
int:4 'LNON'

使用示例访问者创建输出,您可以将其用作解释器/执行器的灵感。

查看 Live On Coliru

完整列表

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <fstream>

namespace qi = boost::spirit::qi;

namespace Ast
{
    enum Command  { NO_CMD, TARGET, SENSOR, PAIR };
    enum Modifier { NO_MODIFIER, ON, OFF, ERASE };

    struct ModifiedCommand
    {
        Command cmd  = NO_CMD;
        Modifier mod = NO_MODIFIER;
    };

    struct OtherCommand
    {
        std::string token;
        OtherCommand(std::string token = "") : token(std::move(token))
        { }
    };

    typedef boost::variant<int, double> Operand;

    typedef boost::variant<Operand, ModifiedCommand, OtherCommand> RpnMachineInstruction;
    typedef std::vector<RpnMachineInstruction> RpnMachineProgram;

    // for printing, you can adapt this to execute the stack instead
    struct Print : boost::static_visitor<std::ostream&>
    {
        Print(std::ostream& os) : os(os) {}
        std::ostream& os;

        std::ostream& operator()(Ast::Command cmd) const {
            switch(cmd) {
                case TARGET: return os << "TARGET" << " ";
                case SENSOR: return os << "SENSOR" << " ";
                case PAIR:   return os << "PAIR"   << " ";
                case NO_CMD: return os << "NO_CMD" << " ";
                default:     return os << "#INVALID_COMMAND#" << " ";
            }
        }

        std::ostream& operator()(Ast::Modifier mod) const {
            switch(mod) {
                case ON:          return os << "[ON]"          << " ";
                case OFF:         return os << "[OFF]"         << " ";
                case ERASE:       return os << "[ERASE]"       << " ";
                case NO_MODIFIER: return os << "[NO_MODIFIER]" << " ";
                default:    return os << "#INVALID_MODIFIER#" << " ";
            }
        }

        std::ostream& operator()(double d) const { return os << "double:" << d << " "; }
        std::ostream& operator()(int    i) const { return os << "int:"    << i << " "; }

        std::ostream& operator()(Ast::OtherCommand const& cmd) const {
            return os << "'" << cmd.token << "'\n";
        }

        std::ostream& operator()(Ast::ModifiedCommand const& cmd) const {
            (*this)(cmd.cmd);
            (*this)(cmd.mod);
            return os << "\n"; 
        }

        template <typename... TVariant>
        std::ostream& operator()(boost::variant<TVariant...> const& v) const { 
            return boost::apply_visitor(*this, v); 
        }
    };

}

BOOST_FUSION_ADAPT_STRUCT(Ast::ModifiedCommand, (Ast::Command, cmd)(Ast::Modifier, mod))

template <typename It, typename Skipper = qi::space_type>
struct RpnGrammar : qi::grammar<It, Ast::RpnMachineProgram(), Skipper>
{
    RpnGrammar() : RpnGrammar::base_type(_start)
    {
        _command.add("TARGET", Ast::TARGET)("SENSOR", Ast::SENSOR)("PAIR", Ast::PAIR);
        _modifier.add("ON", Ast::ON)("OFF", Ast::OFF)("ERASE", Ast::ERASE);

        _start         = *_instruction;
        _instruction   = _operand | _mod_command | _other_command;

        _operand       = _strict_double | qi::int_;

        _mod_command   = _command >> _modifier;
        _other_command = qi::as_string [ +qi::char_("A-Z") ];
    }

  private:
    qi::rule<It, Ast::RpnMachineProgram(),     Skipper> _start;
    qi::rule<It, Ast::RpnMachineInstruction(), Skipper> _instruction;
    qi::rule<It, Ast::ModifiedCommand(),       Skipper> _mod_command;
    qi::rule<It, Ast::Operand(),               Skipper> _operand;

    // note: omitting the Skipper has the same effect as wrapping with `qi::lexeme`
    qi::rule<It, Ast::OtherCommand()> _other_command;

    qi::real_parser<double, boost::spirit::qi::strict_real_policies<double> > _strict_double;
    qi::symbols<char, Ast::Command>  _command;
    qi::symbols<char, Ast::Modifier> _modifier;
};

int main()
{
    std::ifstream ifs("input.txt");
    typedef boost::spirit::istream_iterator It;
    ifs.unsetf(std::ios::skipws);

    RpnGrammar<It> grammar;

    It f(ifs), l;
    Ast::RpnMachineProgram program;
    bool ok = qi::phrase_parse(f, l, grammar, qi::space, program);

    if (ok)
    {
        std::cout << "Parse succeeded, " << program.size() << " stack instructions\n";
        std::for_each(
                program.begin(),
                program.end(),
                Ast::Print(std::cout));
    }
    else
    {
        std::cout << "Parse failed\n";
    }

    if (f != l)
    {
        std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }
}