Shunting-Yard验证表达式

时间:2015-04-14 18:38:43

标签: c# algorithm parsing expression shunting-yard

我们使用Shunting-Yard算法来评估表达式。我们可以通过简单地应用算法来验证表达式。如果缺少操作数,错过匹配的括号和其他内容,它将失败。然而,Shunting-Yard算法具有比人类可读中缀更大的支持语法。例如,

1 + 2
+ 1 2
1 2 +

是提供“1 + 2”作为Shunting-Yard算法输入的可接受方式。 '+ 1 2'和'1 2 +'不是有效的中缀,但标准的Shunting-Yard算法可以处理它们。该算法并不真正关心顺序,它按优先顺序应用运算符来抓取“最近的”操作数。

我们希望将输入限制为有效的人类可读中缀。我正在寻找一种方法来修改Shunting-Yard算法以使用无效的中缀失败,或者在使用Shunting-Yard之前提供中缀验证。

是否有人知道有任何已发布的技术可以做到这一点?我们必须支持基本运算符,自定义运算符,括号和函数(带有多个参数)。我没有看到任何比在线基本操作员更有用的东西。

由于

2 个答案:

答案 0 :(得分:4)

我的问题的解决方案是使用Wikipedia增强state machine recommended by Rici上发布的算法。我在这里发布伪代码,因为它可能对其他人有用。

Support two states, ExpectOperand and ExpectOperator.

Set State to ExpectOperand
While there are tokens to read:
    If token is a constant (number)
        Error if state is not ExpectOperand.
        Push token to output queue.
        Set state to ExpectOperator.
    If token is a variable.
        Error if state is not ExpectOperand.
        Push token to output queue.
        Set state to ExpectOperator.
    If token is an argument separator (a comma).
        Error if state is not ExpectOperator.
        Until the top of the operator stack is a left parenthesis  (don't pop the left parenthesis).
            Push the top token of the stack to the output queue.
            If no left parenthesis is encountered then error.  Either the separator was misplaced or the parentheses were mismatched.
        Set state to ExpectOperand.
    If token is a unary operator.
        Error if the state is not ExpectOperand.
        Push the token to the operator stack.
        Set the state to ExpectOperand.
    If the token is a binary operator.
        Error if the state is not ExpectOperator.
        While there is an operator token at the top of the operator stack and either the current token is left-associative and of lower then or equal precedence to the operator on the stack, or the current token is right associative and of lower precedence than the operator on the stack.
            Pop the operator from the operator stack and push it onto the output queue.
        Push the current operator onto the operator stack.
        Set the state to ExpectOperand. 
    If the token is a Function.
        Error if the state is not ExpectOperand.  
        Push the token onto the operator stack.
        Set the state to ExpectOperand.
    If the token is a open parentheses.
        Error if the state is not ExpectOperand.
        Push the token onto the operator stack.
        Set the state to ExpectOperand.
    If the token is a close parentheses.
         Error if the state is not ExpectOperator.
         Until the token at the top of the operator stack is a left parenthesis.
             Pop the token off of the operator stack and push it onto the output queue.
         Pop the left parenthesis off of the operator stack and discard.
         If the token at the top of the operator stack is a function then pop it and push it onto the output queue.
         Set the state to ExpectOperator.
At this point you have processed all the input tokens.
While there are tokens on the operator stack.
    Pop the next token from the operator stack and push it onto the output queue.
    If a parenthesis is encountered then error.  There are mismatched parenthesis.

通过查看前一个标记,您可以轻松区分一元和二元运算符(我特别谈到负前缀和减法运算符)。如果没有先前的标记,前一个标记是一个开括号,或者前一个标记是一个运算符,那么你遇到了一元前缀运算符,否则你遇到了二元运算符。

答案 1 :(得分:2)

对Shunting Yard算法的一个很好的讨论是http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm 在那里提出的算法使用了运算符堆栈的关键思想,但是有一些语法可以知道接下来会发生什么。它有两个主要函数E(),它们需要一个表达式P(),它需要一个前缀运算符,一个变量,一个数字,括号和函数。前缀运算符总是绑定比二元运算符更紧密,所以你想先处理它。

如果我们说P代表一些前缀序列而B是二元运算符,那么任何表达式都将是

的形式
P B P B P

即。你要么是期待一个前缀序列,要么是一个二元运算符。正式的语法是

E -> P (B P)*

和P将是

P -> -P | variable | constant | etc.

这转换为psudocode为

E() {
    P()
    while next token is a binary op:
         read next op
         push onto stack and do the shunting yard logic
         P()
    if any tokens remain report error
    pop remaining operators off the stack
}

P() {
    if next token is constant or variable:
         add to output
    else if next token is unary minus: 
         push uminus onto operator stack
         P()
}

您可以扩展它以处理其他一元运算符,函数,括号,后缀运算符。