Trouble parsing letters adjacent to operators with Parsec

时间:2018-02-26 17:45:54

标签: parsing haskell parsec

I'm trying to parse a simplified expression language with Parsec in Haskell to solve the Tiny Three-Pass Compiler kata on CodeWars. I'm running into issues where my parser won't parse correctly if there's no whitespace between an identifier and an operator; a * a parses to the complete expression, but a*a only yields the first a.

Stack script to demonstrate the problem:

#!/usr/bin/env stack
-- stack --resolver lts-10.7 script

import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as Tok

langDef :: Tok.LanguageDef ()
langDef = Tok.LanguageDef
  { Tok.commentStart    = ""
  , Tok.commentEnd      = ""
  , Tok.commentLine     = ""
  , Tok.nestedComments  = False
  , Tok.identStart      = letter
  , Tok.identLetter     = letter
  , Tok.opStart         = oneOf "+-*/"
  , Tok.opLetter        = oneOf "+-*/"
  , Tok.reservedNames   = []
  , Tok.reservedOpNames = []
  , Tok.caseSensitive   = True
  }

lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser langDef

identifier :: Parser String
identifier = Tok.identifier lexer

reserved :: String -> Parser ()
reserved = Tok.reserved lexer

data AST = Var String
         | Add AST AST
         | Sub AST AST
         | Mul AST AST
         | Div AST AST
         deriving (Eq, Show)

expression :: Parser AST
expression = term `chainl1` addSubOp

addSubOp :: Parser (AST -> AST -> AST)
addSubOp =  (reserved "+" >> return Add)
        <|> (reserved "-" >> return Sub)

term :: Parser AST
term = factor `chainl1` multDivOp

multDivOp :: Parser (AST -> AST -> AST)
multDivOp =  (reserved "*" >> return Mul)
         <|> (reserved "/" >> return Div)

factor :: Parser AST
factor = variable

variable :: Parser AST
variable = do
  varName <- identifier
  return $ Var varName

main = do
  putStrLn $ show $ parse expression "" "a + a"
  putStrLn $ show $ parse expression "" "a+a"
  putStrLn $ show $ parse expression "" "a - a"
  putStrLn $ show $ parse expression "" "a-a"
  putStrLn $ show $ parse expression "" "a * a"
  putStrLn $ show $ parse expression "" "a*a"
  putStrLn $ show $ parse expression "" "a / a"
  putStrLn $ show $ parse expression "" "a/a"

Running this outputs:

$ ./AdjacentParseIssue.hs 
Right (Add (Var "a") (Var "a"))
Right (Var "a")
Right (Sub (Var "a") (Var "a"))
Right (Var "a")
Right (Mul (Var "a") (Var "a"))
Right (Var "a")
Right (Div (Var "a") (Var "a"))
Right (Var "a")

How can I write my parser so that both a * a and a*a parse to the same result?

0 个答案:

没有答案