Haskell读取变量名

时间:2015-09-24 06:31:54

标签: parsing haskell

我需要编写一个解析某种语言的代码。我坚持解析变量名 - 它可以是至少1个字符长的任何东西,以小写字母开头并且可以包含下划线'_'字符。我想我用以下代码开了个好头:

identToken :: Parser String
identToken = do 
                       c <- letter
                       cs <- letdigs
                       return (c:cs)
             where letter = satisfy isLetter
                   letdigs = munch isLetter +++ munch isDigit +++ munch underscore
                   num = satisfy isDigit
                   underscore = \x -> x == '_'
                   lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?

ident :: Parser Ident
ident = do 
          _ <- skipSpaces
          s <- identToken
          skipSpaces; return $ s

idents :: Parser Command
idents = do 
          skipSpaces; ids <- many1 ident
          ...

然而,这个功能给了我一个奇怪的结果。如果我打电话给我的测试功能

test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p = 
  case readP_to_S prog p of
    [(j, "")] -> Right j
    [] -> Left InvalidParse
    multipleRes -> Left (AmbiguousIdents multipleRes)
  where
    prog :: Parser [Ident]
    prog = do
      result <- many ident
      eof
      return result
像这样:

test_parseIdents  "test"

我明白了:

Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
    (["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
    (["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])

请注意,Parser只是ReadP a的同义词。

我还想在解析器中编码变量名称应该以小写字符开头。

感谢您的帮助。

1 个答案:

答案 0 :(得分:3)

部分问题在于您使用+++运算符。以下代码适用于我:

import Data.Char
import Text.ParserCombinators.ReadP

type Parser a = ReadP a
type Ident = String

identToken :: Parser String
identToken = do c <- satisfy lowerCase
                cs <- letdigs
                return (c:cs)
  where lowerCase = \x -> x `elem` ['a'..'z']
        underscore = \x -> x == '_'
        letdigs = munch (\c -> isLetter c || isDigit c || underscore c)

ident :: Parser Ident
ident = do _ <- skipSpaces
           s <- identToken
           skipSpaces
           return s

test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
    [(j, "")]   -> Right j
    []          -> Left "Invalid parse"
    multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
  where prog :: Parser [Ident]
        prog = do result <- many ident
                  eof
                  return result

main = print $ test_parseIdents "test_1349_zefz"

出了什么问题:

  • +++对其参数强制执行命令,并允许多个替代方法成功(symmetric choice)。 <++左偏,因此只有最左边的选项成功 - &gt;这将消除解析中的歧义,但仍然会留下下一个问题。

  • 您的解析器正在寻找字母首先然后数字,最后下划线。例如,下划线失败后的数字。必须将解析器修改为 字母,数字或下划线的munch个字符。

我还删除了一些未使用的函数,并对数据类型的定义做了有根据的猜测。