I'm trying to parse a simplified expression language with Parsec in Haskell to solve the Tiny Three-Pass Compiler kata on CodeWars. I'm running into issues where my parser won't parse correctly if there's no whitespace between an identifier and an operator; a * a
parses to the complete expression, but a*a
only yields the first a
.
Stack script to demonstrate the problem:
#!/usr/bin/env stack
-- stack --resolver lts-10.7 script
import Text.Parsec
import Text.Parsec.String (Parser)
import qualified Text.Parsec.Token as Tok
langDef :: Tok.LanguageDef ()
langDef = Tok.LanguageDef
{ Tok.commentStart = ""
, Tok.commentEnd = ""
, Tok.commentLine = ""
, Tok.nestedComments = False
, Tok.identStart = letter
, Tok.identLetter = letter
, Tok.opStart = oneOf "+-*/"
, Tok.opLetter = oneOf "+-*/"
, Tok.reservedNames = []
, Tok.reservedOpNames = []
, Tok.caseSensitive = True
}
lexer :: Tok.TokenParser ()
lexer = Tok.makeTokenParser langDef
identifier :: Parser String
identifier = Tok.identifier lexer
reserved :: String -> Parser ()
reserved = Tok.reserved lexer
data AST = Var String
| Add AST AST
| Sub AST AST
| Mul AST AST
| Div AST AST
deriving (Eq, Show)
expression :: Parser AST
expression = term `chainl1` addSubOp
addSubOp :: Parser (AST -> AST -> AST)
addSubOp = (reserved "+" >> return Add)
<|> (reserved "-" >> return Sub)
term :: Parser AST
term = factor `chainl1` multDivOp
multDivOp :: Parser (AST -> AST -> AST)
multDivOp = (reserved "*" >> return Mul)
<|> (reserved "/" >> return Div)
factor :: Parser AST
factor = variable
variable :: Parser AST
variable = do
varName <- identifier
return $ Var varName
main = do
putStrLn $ show $ parse expression "" "a + a"
putStrLn $ show $ parse expression "" "a+a"
putStrLn $ show $ parse expression "" "a - a"
putStrLn $ show $ parse expression "" "a-a"
putStrLn $ show $ parse expression "" "a * a"
putStrLn $ show $ parse expression "" "a*a"
putStrLn $ show $ parse expression "" "a / a"
putStrLn $ show $ parse expression "" "a/a"
Running this outputs:
$ ./AdjacentParseIssue.hs
Right (Add (Var "a") (Var "a"))
Right (Var "a")
Right (Sub (Var "a") (Var "a"))
Right (Var "a")
Right (Mul (Var "a") (Var "a"))
Right (Var "a")
Right (Div (Var "a") (Var "a"))
Right (Var "a")
How can I write my parser so that both a * a
and a*a
parse to the same result?