我需要编写一个解析某种语言的代码。我坚持解析变量名 - 它可以是至少1个字符长的任何东西,以小写字母开头并且可以包含下划线'_'字符。我想我用以下代码开了个好头:
identToken :: Parser String
identToken = do
c <- letter
cs <- letdigs
return (c:cs)
where letter = satisfy isLetter
letdigs = munch isLetter +++ munch isDigit +++ munch underscore
num = satisfy isDigit
underscore = \x -> x == '_'
lowerCase = \x -> x `elem` ['a'..'z'] -- how to add this function to current code?
ident :: Parser Ident
ident = do
_ <- skipSpaces
s <- identToken
skipSpaces; return $ s
idents :: Parser Command
idents = do
skipSpaces; ids <- many1 ident
...
然而,这个功能给了我一个奇怪的结果。如果我打电话给我的测试功能
test_parseIdents :: String -> Either Error [Ident]
test_parseIdents p =
case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left InvalidParse
multipleRes -> Left (AmbiguousIdents multipleRes)
where
prog :: Parser [Ident]
prog = do
result <- many ident
eof
return result
像这样:
test_parseIdents "test"
我明白了:
Left (AmbiguousIdents [(["test"],""),(["t","est"],""),(["t","e","st"],""),
(["t","e","st"],""),(["t","est"],""),(["t","e","st"],""),(["t","e","st"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],""),
(["t","e","s","t"],""),(["t","e","s","t"],""),(["t","e","s","t"],"")])
请注意,Parser
只是ReadP a
的同义词。
我还想在解析器中编码变量名称应该以小写字符开头。
感谢您的帮助。
答案 0 :(得分:3)
部分问题在于您使用+++
运算符。以下代码适用于我:
import Data.Char
import Text.ParserCombinators.ReadP
type Parser a = ReadP a
type Ident = String
identToken :: Parser String
identToken = do c <- satisfy lowerCase
cs <- letdigs
return (c:cs)
where lowerCase = \x -> x `elem` ['a'..'z']
underscore = \x -> x == '_'
letdigs = munch (\c -> isLetter c || isDigit c || underscore c)
ident :: Parser Ident
ident = do _ <- skipSpaces
s <- identToken
skipSpaces
return s
test_parseIdents :: String -> Either String [Ident]
test_parseIdents p = case readP_to_S prog p of
[(j, "")] -> Right j
[] -> Left "Invalid parse"
multipleRes -> Left ("Ambiguous idents: " ++ show multipleRes)
where prog :: Parser [Ident]
prog = do result <- many ident
eof
return result
main = print $ test_parseIdents "test_1349_zefz"
出了什么问题:
+++
对其参数强制执行命令,并允许多个替代方法成功(symmetric choice)。 <++
左偏,因此只有最左边的选项成功 - &gt;这将消除解析中的歧义,但仍然会留下下一个问题。
您的解析器正在寻找字母首先,然后数字,最后下划线。例如,下划线失败后的数字。必须将解析器修改为 字母,数字或下划线的munch
个字符。
我还删除了一些未使用的函数,并对数据类型的定义做了有根据的猜测。