megaparsec中的运算符优先级

时间:2018-08-31 01:57:03

标签: haskell megaparsec

我在使用Megaparsec 6的makeExprParser帮助程序时遇到了麻烦。我似乎无法弄清楚如何以我期望的优先级绑定二进制^和一元-

使用此makeExprParser表达式解析器:

expressionParser :: Parser Expression
expressionParser =
    makeExprParser termParser
      [
        [InfixR $ BinOp BinaryExp <$ symbol "^"],
        [
          Prefix $ MonOp MonoMinus <$ symbol "-",
          Prefix $ MonOp MonoPlus <$ symbol "+"
        ],
        [
          InfixL $ BinOp BinaryMult <$ symbol "*",
          InfixL $ BinOp BinaryDiv <$ symbol "/"
        ],
        [
          InfixL $ BinOp BinaryPlus <$ symbol "+",
          InfixL $ BinOp BinaryMinus <$ symbol "-"
        ]
      ]

我希望这些测试能够通过:

testEqual expressionParser "1^2" "(1)^(2)"
testEqual expressionParser "-1^2" "-(1^2)"
testEqual expressionParser "1^-2" "1^(-2)"
testEqual expressionParser "-1^-2" "-(1^(-2))"

也就是说,-1^-2应该和-(1^(-2))一样解析。这就是例如Python对其进行解析:

>>> 2**-2
0.25
>>> -2**-2
-0.25
>>> -2**2
-4

和Ruby:

irb(main):004:0> 2**-2
=> (1/4)
irb(main):005:0> -2**-2
=> (-1/4)
irb(main):006:0> -2**2
=> -4

但是,此Megaparsec解析器根本无法解析1^-2,而是给了我有用的错误:

(TrivialError (SourcePos {sourceName = \"test.txt\", sourceLine = Pos 1, sourceColumn = Pos 3} :| []) (Just (Tokens ('-' :| \"\"))) (fromList [Tokens ('(' :| \"\"),Label ('i' :| \"nteger\")]))")

我读到说“我可以在这里取这些字符中的任何一个,但是-让我感到困惑”。

如果我像这样调整运算符表的某些优先级(将指数移至一元后-):

expressionParser =
    makeExprParser termParser
      [
        [
          Prefix $ MonOp MonoMinus <$ symbol "-",
          Prefix $ MonOp MonoPlus <$ symbol "+"
        ],
        [InfixR $ BinOp BinaryExp <$ symbol "^"],
        [
          InfixL $ BinOp BinaryMult <$ symbol "*",
          InfixL $ BinOp BinaryDiv <$ symbol "/"
        ],
        [
          InfixL $ BinOp BinaryPlus <$ symbol "+",
          InfixL $ BinOp BinaryMinus <$ symbol "-"
        ]
      ]

然后我不再解析失败,但是-1^2错误地解析为(-1)^2(而不是正确的-(1^2))。

这里是一个完整的自包含解析器,用于显示问题(它需要HUnit,当然还需要megaparsec):

module Hascas.Minimal where

import Data.Void (Void)
import Test.HUnit hiding (test)
import Text.Megaparsec hiding (ParseError)
import Text.Megaparsec.Char
import Text.Megaparsec.Expr
import qualified Text.Megaparsec as MP
import qualified Text.Megaparsec.Char.Lexer as L

data Expression
    = Literal Integer
    | MonOp MonoOperator Expression
    | BinOp BinaryOperator Expression Expression
  deriving (Read, Show, Eq, Ord)

data BinaryOperator
    = BinaryPlus
    | BinaryMinus
    | BinaryDiv
    | BinaryMult
    | BinaryExp
  deriving (Read, Show, Eq, Ord)

data MonoOperator
    = MonoPlus
    | MonoMinus
  deriving (Read, Show, Eq, Ord)

type Parser a = Parsec Void String a
type ParseError = MP.ParseError (Token String) Void

spaceConsumer :: Parser ()
spaceConsumer = L.space space1 lineComment blockComment
  where
    lineComment  = L.skipLineComment "//"
    blockComment = L.skipBlockComment "/*" "*/"

lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer

symbol :: String -> Parser String
symbol = L.symbol spaceConsumer

expressionParser :: Parser Expression
expressionParser =
    makeExprParser termParser
      [
        [InfixR $ BinOp BinaryExp <$ symbol "^"],
        [
          Prefix $ MonOp MonoMinus <$ symbol "-",
          Prefix $ MonOp MonoPlus <$ symbol "+"
        ],
        [
          InfixL $ BinOp BinaryMult <$ symbol "*",
          InfixL $ BinOp BinaryDiv <$ symbol "/"
        ],
        [
          InfixL $ BinOp BinaryPlus <$ symbol "+",
          InfixL $ BinOp BinaryMinus <$ symbol "-"
        ]
      ]

termParser :: Parser Expression
termParser = (
        (try $ Literal <$> L.decimal)
    <|> (try $ parens expressionParser))

parens :: Parser a -> Parser a
parens x = between (symbol "(") (symbol ")") x

main :: IO ()
main = do
    -- just to show that it does work in the + case:
    test expressionParser "1+(-2)" $
      BinOp BinaryPlus (Literal 1) (MonOp MonoMinus $ Literal 2)
    test expressionParser "1+-2" $
      BinOp BinaryPlus (Literal 1 ) (MonOp MonoMinus $ Literal 2)

    -- but not in the ^ case
    test expressionParser "1^-2" $
      BinOp BinaryExp (Literal 1) (MonOp MonoMinus $ Literal 2)
    test expressionParser "-1^2" $
      MonOp MonoMinus $ BinOp BinaryExp (Literal 1) (Literal 2)
    test expressionParser "-1^-2" $
      MonOp MonoMinus $ BinOp BinaryExp (Literal 1) (MonOp MonoMinus $ Literal 2)

    -- exponent precedence is weird
    testEqual expressionParser "1^2" "(1)^(2)"
    testEqual expressionParser "-1^2" "-(1^2)"
    testEqual expressionParser "1^-2" "1^(-2)"
    testEqual expressionParser "-1^-2" "-(1^(-2))"
    testEqual expressionParser "1^2^3^4" "1^(2^(3^(4))))"
  where
    test :: (Eq a, Show a) => Parser a -> String -> a -> IO ()
    test parser input expected = do
      assertEqual input (Right expected) $ parse (spaceConsumer >> parser <* eof) "test.txt" input

    testEqual :: (Eq a, Show a) => Parser a -> String -> String -> IO ()
    testEqual parser input expected = do
        assertEqual input (p expected) (p input)
      where
        p i = parse (spaceConsumer >> parser <* eof) "test.txt" i

是否有可能使Megaparsec以其他语言的优先级来解析这些运算符?

1 个答案:

答案 0 :(得分:3)

makeExprParser termParser [precN, ..., prec1]将产生一个优先级提升解析器,该解析器的工作方式是,每个优先级都调用下一个更高的优先级。因此,如果您手动定义它,则将有一个针对前缀+-的规则,它们使用多和div规则作为操作数。依次将前缀规则用作操作数,并将^规则用作操作数。最后,^规则将termParser用作操作数。

这里要注意的重要一点是^规则(或更普遍的是:任何比前缀运算符具有更高优先级的规则)都会调用一个解析器,该解析器一开始就不会接受前缀运算符。因此前缀运算符不能出现在此类运算符的右边(括号内除外)。

这基本上意味着makeExprParser不支持您的用例。

要解决此问题,可以使用makeExprParser仅处理优先级低于前缀运算符的中缀运算符,然后手动处理前缀运算符和^,以便正确的操作数为^将“返回”前缀运算符。像这样:

expressionParser =
    makeExprParser prefixParser
      [
        [
          InfixL $ BinOp BinaryMult <$ symbol "*",
          InfixL $ BinOp BinaryDiv <$ symbol "/"
        ],
        [
          InfixL $ BinOp BinaryPlus <$ symbol "+",
          InfixL $ BinOp BinaryMinus <$ symbol "-"
        ]
      ]

prefixParser =
  do
    prefixOps <- many prefixOp
    exp <- exponentiationParser
    return $ foldr ($) exp prefixOps
  where
    prefixOp = MonOp MonoMinus <$ symbol "-" <|> MonOp MonoPlus <$ symbol "+"

exponentiationParser =
  do
    lhs <- termParser
    -- Loop back up to prefix instead of going down to term
    rhs <- optional (symbol "^" >> prefixParser)
    return $ maybe lhs (BinOp BinaryExp lhs) rhs

请注意,与makeExprParser不同的是,这还允许多个连续的前缀运算符(例如--x用于双重否定)。如果您不想这样做,请在many的定义中将optional替换为prefixParser