parsec:解析嵌套注释的意外字符

时间:2015-07-27 06:39:01

标签: haskell parsec

我正在尝试解析嵌套的C-like块注释

import Text.ParserCombinators.Parsec
import Control.Monad (liftM)

flat :: Monad m => m [[a]] -> m [a]
flat = liftM concat

comment :: Parser String
comment = between (string "/*") (string "*/") (try nested <|> content)
  where
    content = many (try (noneOf "*/")
                   <|> try (char '*' >> notFollowedBy (char '/') >> return '*')
                   <|> try (char '/' >> notFollowedBy (char '*') >> return '/'))
    nested  = flat $ many comment

"1234567890"解析得很好,但是当我尝试

parse comment "" "/*123/*456*/789*/"

我得到了

Left (line 1, column 3):
unexpected "1"
expecting "/*" or "*/"

我无法弄清楚为什么,我有try我能想到的地方。请帮忙。

1 个答案:

答案 0 :(得分:4)

a <|> b这样的表达式中,如果a可以匹配空字符串,则永远不会尝试b,这种情况发生在try nested <|> content

您可以通过要求至少一个评论匹配或其他字符来修复您的方法:

comment :: Parser String
comment = between (string "/*") (string "*/") ( flat $ many $ (try comment <|> liftM toString other ) )
  where
    toString x = [x]
    other = try (noneOf "*/")
            <|> try (char '*' >> notFollowedBy (char '/') >> return '*')
            <|> try (char '/' >> notFollowedBy (char '*') >> return '/')

FWIW,Text.Parsec.Token的作用如下:

https://github.com/aslatter/parsec/blob/master/Text/Parsec/Token.hs#L698-714

对于您的具体情况,等效代码为:

import Data.List (nub)

commentStart = "/*"
commentEnd = "*/"

multiLineComment =
    do { try (string commentStart)
       ; inComment
       }

inComment = inCommentMulti

inCommentMulti
    =   do{ try (string commentEnd) ; return () }
    <|> do{ multiLineComment                     ; inCommentMulti }
    <|> do{ skipMany1 (noneOf startEnd)          ; inCommentMulti }
    <|> do{ oneOf startEnd                       ; inCommentMulti }
    <?> "end of comment"
    where
      startEnd   = nub (commentEnd ++ commentStart)