Spirit X3,这种错误处理方法有用吗?

时间:2019-07-15 22:44:02

标签: c++ boost-spirit boost-spirit-x3

在阅读error handling上的Spirit X3教程和一些实验之后。我被得出结论了。

我相信X3中的错误处理主题还有一些改进的余地。在我看来,一个重要的目标是提供有意义的错误消息。首先,最重要的是添加一个将_pass(ctx)成员设置为false的语义动作不会这样做,因为X3会尝试匹配其他内容。仅抛出x3::expectation_failure会过早退出解析功能,即不尝试匹配其他任何内容。因此,剩下的就是解析器指令expect[a]和解析器operator>以及从语义动作中手动抛出x3::expectation_failure的过程。我确实相信有关此错误处理的词汇太有限。请考虑以下X3 PEG语法行:

const auto a = a1 >> a2 >> a3;
const auto b = b1 >> b2 >> b3;
const auto c = c1 >> c2 >> c3;

const auto main_rule__def =
(
 a |
 b |
 c );

现在对于表达式a,我不能使用expect[]operator>,因为其他替代方法可能是有效的。我可能是错的,但是我认为X3要求我拼写出可以匹配的替代错误表达式,如果匹配,它们可能会抛出x3::expectation_failure,这很麻烦。

问题是,是否有一种很好的方法来检查我的PEG构造中的错误情况,并使用当前的X3工具检查a,b和c的有序替代项?

如果答案是否定的,我想提出我的想法,以为此提供合理的解决方案。我相信我需要一个新的解析器指令。该指令应该做什么?解析失败时,它应调用附加的语义操作。该属性显然是未使用的,但是在第一次出现解析不匹配时,我需要在迭代器位置设置_where成员。因此,如果a2失败,则应在_where结束后将a1设置为1。我们将其称为解析指令neg_sa。这意味着否定语义动作。

pseudocode

// semantic actions
auto a_sa = [&](auto& ctx)
{
  // add _where to vector v
};

auto b_sa = [&](auto& ctx)
{
  // add _where to vector v
};

auto c_sa = [&](auto& ctx)
{
  // add _where to vector v

  // now we know we have a *real* error.
  // find the peak iterator value in the vector v
  // the position tells whether it belongs to a, b or c.
  // now we can formulate an error message like: “cannot make sense of b upto this position.”
  // lastly throw x3::expectation_failure
};

// PEG
const auto a = a1 >> a2 >> a3;
const auto b = b1 >> b2 >> b3;
const auto c = c1 >> c2 >> c3;

const auto main_rule__def =
(
 neg_sa[a][a_sa] |
 neg_sa[b][b_sa] |
 neg_sa[c][c_sa] );

我希望我清楚地提出了这个想法。如果需要进一步说明,请在评论部分告诉我。

2 个答案:

答案 0 :(得分:1)

好吧,冒着在一个示例中混淆太多内容的风险,

namespace square::peg {
    using namespace x3;

    const auto quoted_string = lexeme['"' > *(print - '"') > '"'];
    const auto bare_string   = lexeme[alpha > *alnum] > ';';
    const auto two_ints      = int_ > int_;

    const auto main          = quoted_string | bare_string | two_ints;

    const auto entry_point   = skip(space)[ expect[main] > eoi ];
} // namespace square::peg

那应该做。关键是,唯一应该期待的事情 点是使各个分支失败的东西 无疑是正确的分支。 (否则,实际上不会有 很难期望)。

通过两个较小的get_info专长用于更漂亮的消息¹,这可能会导致 甚至当手动捕获异常时也可以提供体面的错误消息:

Live On Coliru

int main() {
    using It = std::string::const_iterator;

    for (std::string const input : {
            "   -89 0038  ",
            "   \"-89 0038\"  ",
            "   something123123      ;",
            // undecidable
            "",
            // violate expecations, no successful parse
            "   -89 oops  ",   // not an integer
            "   \"-89 0038  ", // missing "
            "   bareword ",    // missing ;
            // trailing debris, successful "main"
            "   -89 3.14  ",   // followed by .14
        })
    {
        std::cout << "====== " << std::quoted(input) << "\n";

        It iter = input.begin(), end = input.end();
        try {
        if (parse(iter, end, square::peg::entry_point)) {
            std::cout << "Parsed successfully\n";
        } else {
            std::cout << "Parsing failed\n";
        }
        } catch (x3::expectation_failure<It> const& ef) {
            auto pos = std::distance(input.begin(), ef.where());
            std::cout << "Expect " << ef.which() << " at "
                << "\n\t" << input
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^\n";
        }
    }
}

打印

====== "   -89 0038  "
Parsed successfully
====== "   \"-89 0038\"  "
Parsed successfully
====== "   something123123      ;"
Parsed successfully
====== ""
Expect quoted string, bare string or integer number pair at

    ^
====== "   -89 oops  "
Expect integral number at
       -89 oops 
    -------^
====== "   \"-89 0038  "
Expect '"' at
       "-89 0038 
    --------------^
====== "   bareword "
Expect ';' at
       bareword
    ------------^
====== "   -89 3.14  "
Expect eoi at
       -89 3.14 
    --------^

这已经超出了大多数人对解析器的期望。

但是:自动化,也更加灵活

我们可能不仅仅满足于期望并提供帮助。确实,您可以报告并继续解析,因为通常存在不匹配的情况:这是on_error出现的地方。

让我们创建一个标签库:

struct with_error_handling {
    template<typename It, typename Ctx>
        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const&) const {
            std::string s(f,l);
            auto pos = std::distance(f, ef.where());

            std::cout << "Expecting " << ef.which() << " at "
                << "\n\t" << s
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^\n";

            return error_handler_result::fail;
        }
};

现在,我们要做的就是从with_error_handling和BAM!中获得规则ID,我们不必编写任何异常处理程序,规则将通过适当的诊断简单地“失败”。更重要的是,某些输入可能会导致多种(非常有用的)诊断:

auto const eh = [](auto p) {
    struct _ : with_error_handling {};
    return rule<_> {} = p;
};

const auto quoted_string = eh(lexeme['"' > *(print - '"') > '"']);
const auto bare_string   = eh(lexeme[alpha > *alnum] > ';');
const auto two_ints      = eh(int_ > int_);

const auto main          = quoted_string | bare_string | two_ints;
using main_type = std::remove_cv_t<decltype(main)>;

const auto entry_point   = skip(space)[ eh(expect[main] > eoi) ];

现在,main变为:

Live On Coliru

for (std::string const input : { 
        "   -89 0038  ",
        "   \"-89 0038\"  ",
        "   something123123      ;",
        // undecidable
        "",
        // violate expecations, no successful parse
        "   -89 oops  ",   // not an integer
        "   \"-89 0038  ", // missing "
        "   bareword ",    // missing ;
        // trailing debris, successful "main"
        "   -89 3.14  ",   // followed by .14
    })
{
    std::cout << "====== " << std::quoted(input) << "\n";

    It iter = input.begin(), end = input.end();
    if (parse(iter, end, square::peg::entry_point)) {
        std::cout << "Parsed successfully\n";
    } else {
        std::cout << "Parsing failed\n";
    }
}

程序将打印:

====== "   -89 0038  "
Parsed successfully
====== "   \"-89 0038\"  "
Parsed successfully
====== "   something123123      ;"
Parsed successfully
====== ""
Expecting quoted string, bare string or integer number pair at 

    ^
Parsing failed
====== "   -89 oops  "
Expecting integral number at 
       -89 oops  
    -------^
Expecting quoted string, bare string or integer number pair at 
       -89 oops  
    ^
Parsing failed
====== "   \"-89 0038  "
Expecting '"' at 
       "-89 0038  
    --------------^
Expecting quoted string, bare string or integer number pair at 
       "-89 0038  
    ^
Parsing failed
====== "   bareword "
Expecting ';' at 
       bareword 
    ------------^
Expecting quoted string, bare string or integer number pair at 
       bareword 
    ^
Parsing failed
====== "   -89 3.14  "
Expecting eoi at 
       -89 3.14  
    --------^
Parsing failed

属性传播,on_success

当解析器实际上不解析任何内容时,它们并不是很有用,所以让我们添加一些建设性的价值处理,同时展示on_success

定义一些AST类型以接收属性:

struct quoted : std::string {};
struct bare   : std::string {};
using  two_i  = std::pair<int, int>;
using Value = boost::variant<quoted, bare, two_i>;

确保我们可以打印Value

static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
    struct {
        std::ostream& _os;
        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } 
        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } 
        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } 
    } vis{os};

    boost::apply_visitor(vis, v);
    return os;
}

现在,使用旧的as<>技巧来强制属性类型,这次使用错误处理:

锦上添花,让我们在on_success中演示with_error_handling

    template<typename It, typename Ctx>
        void on_success(It f, It l, two_i const& v, Ctx const&) const {
            std::cout << "Parsed " << std::quoted(std::string(f,l)) << " as integer pair " << v.first << ", " << v.second << "\n";
        }

现在具有很大程度上未修改的主程序(也只打印结果值):

Live On Coliru

    It iter = input.begin(), end = input.end();
    Value v;
    if (parse(iter, end, square::peg::entry_point, v)) {
        std::cout << "Result value: " << v << "\n";
    } else {
        std::cout << "Parsing failed\n";
    }

打印

====== "   -89 0038  "
Parsed "-89 0038" as integer pair -89, 38
Result value: two_i(-89, 38)
====== "   \"-89 0038\"  "
Result value: quoted("-89 0038")
====== "   something123123      ;"
Result value: bare(something123123)
====== ""
Expecting quoted string, bare string or integer number pair at 

    ^
Parsing failed
====== "   -89 oops  "
Expecting integral number at 
       -89 oops  
    -------^
Expecting quoted string, bare string or integer number pair at 
       -89 oops  
    ^
Parsing failed
====== "   \"-89 0038  "
Expecting '"' at 
       "-89 0038  
    --------------^
Expecting quoted string, bare string or integer number pair at 
       "-89 0038  
    ^
Parsing failed
====== "   bareword "
Expecting ';' at 
       bareword 
    ------------^
Expecting quoted string, bare string or integer number pair at 
       bareword 
    ^
Parsing failed
====== "   -89 3.14  "
Parsed "-89 3" as integer pair -89, 3
Expecting eoi at 
       -89 3.14  
    --------^
Parsing failed

真的过分了

我不了解您,但是我讨厌产生副作用,更不用说从解析器打印到控制台了。让我们改用x3::with

我们想通过Ctx&参数附加到诊断程序,而不是编写 到std::cout处理程序中的on_error

struct with_error_handling {
    struct diags;

    template<typename It, typename Ctx>
        x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {
            std::string s(f,l);
            auto pos = std::distance(f, ef.where());

            std::ostringstream oss;
            oss << "Expecting " << ef.which() << " at "
                << "\n\t" << s
                << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^";

            x3::get<diags>(ctx).push_back(oss.str());

            return error_handler_result::fail;
        }
};

在呼叫站点上,我们可以传递上下文:

std::vector<std::string> diags;

if (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {
    std::cout << "Result value: " << v;
} else {
    std::cout << "Parsing failed";
}

std::cout << " with " << diags.size() << " diagnostics messages: \n";

完整程序还会打印诊断:

Live On Wandbox²

完整列表

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <iomanip>

namespace x3 = boost::spirit::x3;

struct quoted : std::string {};
struct bare   : std::string {};
using  two_i  = std::pair<int, int>;
using Value = boost::variant<quoted, bare, two_i>;

static inline std::ostream& operator<<(std::ostream& os, Value const& v) {
    struct {
        std::ostream& _os;
        void operator()(quoted const& v) const { _os << "quoted(" << std::quoted(v) << ")";             } 
        void operator()(bare const& v) const   { _os << "bare(" << v << ")";                            } 
        void operator()(two_i const& v) const  { _os << "two_i(" << v.first << ", " << v.second << ")"; } 
    } vis{os};

    boost::apply_visitor(vis, v);
    return os;
}

namespace square::peg {
    using namespace x3;

    struct with_error_handling {
        struct diags;

        template<typename It, typename Ctx>
            x3::error_handler_result on_error(It f, It l, expectation_failure<It> const& ef, Ctx const& ctx) const {
                std::string s(f,l);
                auto pos = std::distance(f, ef.where());

                std::ostringstream oss;
                oss << "Expecting " << ef.which() << " at "
                    << "\n\t" << s
                    << "\n\t" << std::setw(pos) << std::setfill('-') << "" << "^";

                x3::get<diags>(ctx).push_back(oss.str());

                return error_handler_result::fail;
            }
    };

    template <typename T = x3::unused_type> auto const as = [](auto p) {
        struct _ : with_error_handling {};
        return rule<_, T> {} = p;
    };

    const auto quoted_string = as<quoted>(lexeme['"' > *(print - '"') > '"']);
    const auto bare_string   = as<bare>(lexeme[alpha > *alnum] > ';');
    const auto two_ints      = as<two_i>(int_ > int_);

    const auto main          = quoted_string | bare_string | two_ints;
    using main_type = std::remove_cv_t<decltype(main)>;

    const auto entry_point   = skip(space)[ as<Value>(expect[main] > eoi) ];
} // namespace square::peg

namespace boost::spirit::x3 {
    template <> struct get_info<int_type> {
        typedef std::string result_type;
        std::string operator()(int_type const&) const { return "integral number"; }
    };
    template <> struct get_info<square::peg::main_type> {
        typedef std::string result_type;
        std::string operator()(square::peg::main_type const&) const { return "quoted string, bare string or integer number pair"; }
    };
}

int main() {
    using It = std::string::const_iterator;
    using D = square::peg::with_error_handling::diags;

    for (std::string const input : { 
            "   -89 0038  ",
            "   \"-89 0038\"  ",
            "   something123123      ;",
            // undecidable
            "",
            // violate expecations, no successful parse
            "   -89 oops  ",   // not an integer
            "   \"-89 0038  ", // missing "
            "   bareword ",    // missing ;
            // trailing debris, successful "main"
            "   -89 3.14  ",   // followed by .14
        })
    {
        std::cout << "====== " << std::quoted(input) << "\n";

        It iter = input.begin(), end = input.end();
        Value v;
        std::vector<std::string> diags;

        if (parse(iter, end, x3::with<D>(diags) [square::peg::entry_point], v)) {
            std::cout << "Result value: " << v;
        } else {
            std::cout << "Parsing failed";
        }

        std::cout << " with " << diags.size() << " diagnostics messages: \n";

        for(auto& msg: diags) {
            std::cout << " - " << msg << "\n";
        }
    }
}

¹您可以改用规则的名称,从而避免使用更复杂的技巧

²在旧版本的库中,您可能需要为获得with<>数据上的引用语义而奋斗: Live On Coliru

答案 1 :(得分:0)

  

现在用于表达式a我不能使用Expect []或operator>,因为其他替代方法可能是有效的。我可能是错的,但我认为X3要求我拼写出可以匹配的替代错误表达式,如果匹配,它们可能会抛出x3 :: expectation_failure,这很麻烦。

很简单:

const auto main_rule__def = x3::expect [
 a |
 b |
 c ];

或者,甚至:

const auto main_rule__def = x3::eps > (
 a |
 b |
 c );
  

如果答案是否定的,我想提出我的想法,以为此提供合理的解决方案。我相信我需要一个新的解析器指令。该指令应该做什么? 当解析失败时,它应该调用附加的语义动作。

现有的x3 :: on_error功能已经知道如何执行此操作。请注意:这有点复杂,但基于同样的优点,它也相当灵活。

基本上,您需要在ID类型上实现静态接口(x3::rule<ID, Attr>,在您选择的约定中可能为main_rule_class)。存储库中有一些编译器示例,展示了如何使用它。

  
    

旁注:同时有on_successon_error使用此范例

  

将使用参数on_error在ID类型的默认构造副本上调用ID().on_error(first, last, expectation_failure_object, context)成员。

const auto main_rule__def =
(
 neg_sa[a][a_sa] |
 neg_sa[b][b_sa] |
 neg_sa[c][c_sa] );

说实话,我认为您正在为自己的困惑铺平道路。您有3个单独的错误操作对您有什么好处?您如何确定发生哪个错误?

真的只有两种可能性:

  • 您确实知道需要一个特定的分支并且它失败了(这是期望失败,您可以定义将其编码为a,{{ 1}}或b)。
  • 或者您不知道隐含了哪个分支(例如,什么时候分支可以以相似的产品开始而在它们内部失败)。在那种情况下,没有人能告诉应该调用哪个错误处理程序 ,因此要点不止一个。

    实际上,正确的做法是使更高级别的c失败,这意味着“没有可能的分支成功”。

    这是main_rule的处理方式。