我试图在Idris中正式化基于正则表达式的字符串搜索工具 (当前状态here)。但我正在解决正则表达式的问题。我已经尝试构建一个小的解析库,但放弃了这个,转而使用Lightyear,这是一个用于Idris的解析组合库。
由于我已经习惯了Haskell,我尝试使用类似的策略,而不是使用Parsec。我的主要问题是如何在Lightyear解析器上处理左递归?我尝试了几种编码,但几乎所有解析器最终都会循环并在生成的代码中导致分段错误。
答案 0 :(得分:4)
我不认识Lightyear,但我在将Parsec移植到Idris方面取得了一些成功:
module Parser
data Parser : Type -> Type where
P : (String -> List (a, String)) -> Parser a
unP : Parser a -> String -> List (a, String)
unP (P f) = f
total stripPrefix : (Eq a) => List a -> List a -> Maybe (List a)
stripPrefix [] ys = Just ys
stripPrefix (x::xs) (y::ys) = if (x == y) then stripPrefix xs ys else Nothing
stripPrefix _ _ = Nothing
total token : String -> Parser ()
token tk = P $ \s => case stripPrefix (unpack tk) (unpack s) of
Just s' => [((), pack s')]
Nothing => []
total skip : Parser ()
skip = P $ \s => case unpack s of
[] => []
(_::s') => [((), pack s')]
instance Functor Parser where
map f p = P $ \s => map (\(x, s') => (f x, s')) (unP p s)
instance Applicative Parser where
pure x = P $ \s => [(x, s)]
(P pf) <*> (P px) = P $ \s => concat (map (\(f, s') => map (\(x, s'') => (f x, s'')) (px s')) (pf s))
instance Alternative Parser where
empty = P $ \s => []
(P p1) <|> (P p2) = P $ \s => case p1 s of
[] => p2 s
results => results
instance Monad Parser where
px >>= f = P $ \s => concat (map (\(x, s') => unP (f x) s') (unP px s))
total runParser : Parser a -> String -> Maybe a
runParser (P p) s = case p s of
[(x, "")] => Just x
_ => Nothing
这允许chainl
:
chainl1 : Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= rest
where
rest x = do { f <- op; y <- p; rest $ f x y } <|> return x
chainl : Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op x = chainl1 p op <|> return x
然后我们可以直接从chainl
文档中对表达式解析器进行音译(我太懒了,无法实现正确的integer
解析器,所以我们只使用一元) :
parens : Parser a -> Parser a
parens p = token "(" *> p <* token ")"
symbol : String -> Parser ()
symbol = token
integer : Parser Nat
integer = P $ \s => case unpack s of
('Z'::s') => [(Z, pack s')]
('S'::s') => map (\(n, s'') => (S n, s'')) $ unP integer (pack s')
_ => []
mutual
expr : Parser Nat
expr = term `chainl1` addop
term : Parser Nat
term = factor `chainl1` mulop
factor : Parser Nat
factor = parens expr <|> integer
mulop : Parser (Nat -> Nat -> Nat)
mulop = (symbol "*" *> pure (*)) <|>
(symbol "/" *> pure div)
addop : Parser (Nat -> Nat -> Nat)
addop = (symbol "+" *> pure (+)) <|>
(symbol "-" *> pure (-))
现在,如果你试试这个:
main : IO ()
main = do
s <- getLine
printLn $ runParser expr s
然后它将具有你所观察到的相同的不同行为。但是,我们可以做两个小的改动:
介绍一个懒惰的替代组合子:
orElse : Parser a -> Lazy (Parser a) -> Parser a
orElse p1 p2 = P $ \s => case unP p1 s of
[] => unP p2 s
results => results
通过翻转两个备选方案,确保factor
的递归部分(即parens expr
部分)处于此惰性位置:
factor = integer `orElse` parens expr
然后按预期工作:
13:06:07 [cactus@galaxy brainfuck]$ idris Expr.idr -o Expr
13:06:27 [cactus@galaxy brainfuck]$ echo "SZ+(SSZ*SSSZ)" | ./Expr
Just 7
答案 1 :(得分:0)
chainl
和chainl1
组合器可与Lightyear包一起使用。但是,它们是默认提供的。我已将组合器添加到我自己的模块中,我需要它们:
chainl1 : Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= rest
where rest a1 = (do f <- op
a2 <- p
rest (f a1 a2)) <|> pure a1
chainl : Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = (p `chainl1` op) <|> pure a
似乎工作正常。希望有所帮助。