在分析器中使用自定义数据类型作为令牌

时间:2019-11-27 16:40:26

标签: haskell parser-combinators

我正在尝试使用解析器组合器并使用由自定义数据类型定义的我自己的令牌来编写解析器,而不是像https://wiki.haskell.org/Parsing_a_simple_imperative_language中那样使用languageDef

到目前为止,我已经从这里https://jakewheat.github.io/intro_to_parsing/#very-simple-expression-parsing

获得了一个用于测试解析器的基本解析器。
import Data.Char (isLetter, isDigit)
import Control.Monad
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Char (digit)

regularParse :: Parser a -> String -> Either ParseError a
regularParse p = parse p ""

-- regularParse num "143"
num :: Parser Integer
num = do
    n <- many1 digit
    return (read n)

-- regularParse var "hello, world!"
var :: Parser String
var = do
    fc <- firstChar
    rest <- many nonFirstChar
    return (fc:rest)
  where
    firstChar = satisfy (\a -> isLetter a || a == '_')
    nonFirstChar = satisfy (\a -> isDigit a || isLetter a || a == '_')

在wiki.haskell.org的第一个示例中,使用emptyDef中的Text.ParserCombinators.Parsec.Language定义了语言和标记。这使他们可以像这样写语言

languageDef =
    emptyDef { Token.commentStart    = "/*"
             , ...
             , Token.reservedNames   = [ "if"
                                       , "while"
                                       , ...

lexer = Token.makeTokenParser languageDef
reserved = Token.reserved lexer

whileStmt :: Parser Stmt
whileStmt =
  do reserved "while"
     cond <- bExpression
     reserved "do"
     stmt <- statement
     return $ While cond stmt

如果我想编写自己的自定义数据类型并使用自己的令牌(如

),该怎么办?
data Token = Zero
           | One [Char]
           | Two [Char] [Char]

token :: Parser Token
token = twoParser
     <|> oneParser
     <|> zeroParser

twoParser :: Parser Token
twoParser = do
  two <- tExpression
  a <- var
  b <- var
  return $ two a b

tExpression :: Parser Token
tExpression = do
  ...

var :: Parser String
var = do
    fc <- firstChar
    rest <- many nonFirstChar
    return (fc:rest)
  where
    firstChar = satisfy (\a -> isLetter a || a == '_')
    nonFirstChar = satisfy (\a -> isDigit a || isLetter a || a == '_')

我不确定这是否是解决此问题的正确方法,因为tExpression函数依赖语言定义中的许多内容。谢谢

0 个答案:

没有答案