使用Text.Parsec.Layout

时间:2016-05-07 07:56:05

标签: parsing haskell indentation parsec parser-combinators

我尝试使用parsec-layout使用类似Haskell的语法解析一种小语言。两个似乎互不相互影响的关键功能是:

  • 函数应用程序语法是并置的,即如果FE是术语,F EF应用于E的语法。
  • 缩进可用于表示嵌套,即以下两个是等效的:

    X = case Y of
      A -> V
      B -> W
    
    X = case Y of A -> V; B -> W
    

我还没有设法找出跳过和保留空格的组合,这样我就可以解析这些定义的列表。这是我的简化代码:

import Text.Parsec hiding (space, runP)
import Text.Parsec.Layout
import Control.Monad (void)

type Parser = Parsec String LayoutEnv

data Term = Var String
          | App Term Term
          | Case Term [(String, Term)]
          deriving Show

name :: Parser String
name = spaced $ (:) <$> upper <*> many alphaNum

kw :: String -> Parser ()
kw = void . spaced . string

reserved :: String -> Parser ()
reserved s = try $ spaced $ string s >> notFollowedBy alphaNum

term :: Parser Term
term = foldl1 App <$> part `sepBy1` space
  where
    part = choice [ caseBlock
                  , Var <$> name
                  ]

    caseBlock = Case <$> (reserved "case" *> term <* reserved "of") <*> laidout alt

    alt = (,) <$> (name <* kw "->") <*> term

binding :: Parser (String, Term)
binding = (,) <$> (name <* kw "=") <*> term

-- https://github.com/luqui/parsec-layout/issues/1
trim :: String -> String
trim = reverse . dropWhile (== '\n') . reverse

runP :: Parser a -> String -> Either ParseError a
runP p = runParser (p <* eof) defaultLayoutEnv "" . trim

如果我尝试在

之类的输入上运行它
s = unlines [ "A = case B of"
            , " X -> Y Z"
            , "C = D"
            ]

通过runP (laidout binding) s,它在应用Y Z上失败了:

(line 2, column 10):
expecting space or semi-colon

但是,如果我将term的定义更改为

term = foldl1 App <$> many1 part

然后它不会停止在(未缩进的!)第三行的开头解析该术语,导致

(line 3, column 4):
expecting semi-colon

1 个答案:

答案 0 :(得分:1)

我认为问题与name已经消除了以下空格有关,因此sepBy1定义中的term无法看到它。

考虑term的简化版本:

term0 = foldl1 App <$> (Var <$> name) `sepBy1` space

term1 = foldl1 App <$> (Var <$> name') `sepBy1` space

name' = (:) <$> upper <*> many alphaNum

term3 = foldl1 App <$> many (Var <$> name)

然后:

runP term0 "A B C"   -- fails
runP term1 "A B C"   -- succeeds
runP term3 "A B C"   -- succeeds

我认为解决方案的一部分是定义

part = [ caseBlock, Var <$> name' ]

其中name'如上所述。但是,仍然存在一些问题。