用Parsec解析非二元运算符

时间:2015-11-29 19:33:28

标签: parsing haskell parsec

传统上,算术运算符被认为是二进制(左或右关联),因此大多数工具只处理二元运算符。

是否有一种简单的方法可以使用Parsec解析算术运算符,Parsec可以有任意数量的参数?

例如,应将以下表达式解析为树

(a + b) + c + d * e + f

parsed expression

1 个答案:

答案 0 :(得分:1)

是的!关键是要首先解决一个更简单的问题,即将+*建模为只有两个子节点的树节点。要添加四件事,我们只需使用+三次。

这是一个很难解决的问题,因为这个问题只有一个Text.Parsec.Expr模块。您的示例实际上是example code in the documentation可解析的。我在这里稍微简化了一下:

module Lib where

import Text.Parsec
import Text.Parsec.Language
import qualified Text.Parsec.Expr as Expr
import qualified Text.Parsec.Token as Tokens

data Expr =
    Identifier String
  | Multiply Expr Expr
  | Add Expr Expr

instance Show Expr where
  show (Identifier s) = s
  show (Multiply l r) = "(* " ++ (show l) ++ " " ++ (show r) ++ ")"
  show (Add l r) = "(+ " ++ (show l) ++ " " ++ (show r) ++ ")"

-- Some sane parser combinators that we can plagiarize from the Haskell parser.
parens = Tokens.parens haskell
identifier = Tokens.identifier haskell
reserved = Tokens.reservedOp haskell

-- Infix parser.
infix_ operator func =
  Expr.Infix (reserved operator >> return func) Expr.AssocLeft

parser =
  Expr.buildExpressionParser table term <?> "expression"
  where
    table = [[infix_ "*" Multiply], [infix_ "+" Add]]

term =
  parens parser
  <|> (Identifier <$> identifier)
  <?> "term"

在GHCi中运行:

λ> runParser parser () "" "(a + b) + c + d * e + f"
Right (+ (+ (+ (+ a b) c) (* d e)) f)

有很多方法可以将此树转换为所需的形式。这是一个非常缓慢的问题:

data Expr' =
    Identifier' String
  | Add' [Expr']
  | Multiply' [Expr']
  deriving (Show)

collect :: Expr -> (Expr -> Bool) -> [Expr]
collect e f | (f e == False) = [e]
collect e@(Add l r) f =
  collect l f ++ collect r f
collect e@(Multiply l r) f =
  collect l f ++ collect r f

isAdd :: Expr -> Bool
isAdd (Add _ _) = True
isAdd _ = False

isMultiply :: Expr -> Bool
isMultiply (Multiply _ _) = True
isMultiply _ = False

optimize :: Expr -> Expr'
optimize (Identifier s) = Identifier' s
optimize e@(Add _ _) = Add' (map optimize (collect e isAdd))
optimize e@(Multiply _ _) = Multiply' (map optimize (collect e isMultiply))

但是,我会注意到,对于解析器或编译器来说,几乎总是Expr是Good Enough™。