Question

如果违反了语义规则，使用Parsec如何指示特定位置的错误。我知道通常我们不想做这些事情，但请考虑示例语法。

<foo> ::= <bar> | ...
<bar> ::= a positive integer power of two

<bar>规则是一个有限集（我的例子是任意的），上面的纯粹方法可能是choice组合子的小心应用程序，但是这在空间和时间上可能是不切实际的。在递归下降或工具包生成的解析器中，标准技巧是解析整数（更宽松的语法），然后在语义上检查更难的约束。对于Parsec，我可以使用natural解析器并在不匹配或fail或其他任何内容时检查调用unexpected的结果。但是，如果我们这样做，默认错误位置是错误的。不知何故，我需要在较早的状态下引发错误。

我尝试了一个强力解决方案并编写了一个使用getPosition和setPosition的组合器，如this very similar question所示。当然，我也不成功（错误位置当然是错误的）。我已多次遇到这种模式。我正在寻找这种类型的组合器：

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred lbl p = do
  ok <- lookAhead $ fmap pred (try p) <|> return False -- peek ahead
  if ok then p         -- consume the input if the value passed the predicate
   else fail lbl       -- otherwise raise the error at the *start* of this token

pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

以上不起作用。（我也试过这个变种。）不知何故，解析器回溯了说它期待一个数字。我认为它正在返回使其最远的错误。即使{get,set}ParserState也无法删除该内存。

我处理这种句法模式错了吗？ Parsec用户如何解决这些类型的问题？

谢谢！

Answer 1

我认为你的想法都没问题。另外两个答案与Parsec有关，但我想在两者中都注意到案件Megaparsec只做正确的事：

{-# LANGUAGE TypeApplications #-}

module Main (main) where

import Control.Monad
import Data.Void
import Text.Megaparsec
import qualified Text.Megaparsec.Char.Lexer as L

type Parser = Parsec Void String

withPredicate1 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate1 f msg p = do
  r <- lookAhead p
  if f r
    then p
    else fail msg

withPredicate2 :: (a -> Bool) -> String -> Parser a -> Parser a
withPredicate2 f msg p = do
  mpos <- getNextTokenPosition -- †
  r    <- p
  if f r
    then return r
    else do
      forM_ mpos setPosition
      fail msg

main :: IO ()
main = do
  let msg = "I only like numbers greater than 42!"
  parseTest' (withPredicate1 @Integer (> 42) msg L.decimal) "11"
  parseTest' (withPredicate2 @Integer (> 42) msg L.decimal) "22"

如果我运行它：

The next big Haskell project is about to start!
λ> :main
1:1:
  |
1 | 11
  | ^
I only like numbers greater than 42!
1:1:
  |
1 | 22
  | ^
I only like numbers greater than 42!
λ>

亲自试试吧！按预期工作。

对于令牌包含其开头和结尾位置的流，

†getNextTokenPosition比getPosition更正确。在您的情况下，这可能是重要的，也可能不重要。

Answer 2

这不是我喜欢的解决方案，但你可以催眠Parsec相信它的消费有一次失败：

failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))

这是一个完整的例子：

import Control.Monad
import Text.Parsec
import Text.Parsec.Char
import Text.Parsec.Error
import Text.Parsec.Prim
import Debug.Trace

failAt pos msg = mkPT (\_ -> return (Consumed (return $ Error $ newErrorMessage (Expect msg) pos)))

type P a = Parsec String () a

withPredicate :: (a -> Bool) -> String -> P a -> P a
withPredicate pred msg p = do
    pos <- getPosition
    x <- p
    unless (pred x) $ failAt pos msg
    return x

natural = read <$> many1 digit
pPowerOfTwo = withPredicate isPowerOfTwo "power of two" natural
  where isPowerOfTwo = (`elem` [2^i | i<-[1..20]])

main = print $ runParser pPowerOfTwo  () "myinput" "4095"

运行时，会产生：

Left "myinput" (line 1, column 1):
expecting power of two

Answer 3

使用lookAhead，我们可以运行解析器而不消耗任何输入或注册任何新的错误，但是可以记录最终的状态。然后可以对解析器的结果应用保护措施。如果值未通过语义检查，则保护程序可能会以其希望的任何方式失败。如果防护装置发生故障，则错误位于初始位置。如果保护措施成功，我们将解析器重置为已记录状态，从而无需重新执行p。

guardP :: Stream s m t => (a -> ParsecT s u m ()) -> ParsecT s u m a -> ParsecT s u m a
guardP guard p = do
  (a, s) <- try . lookAhead $ do
    a <- p
    s <- getParserState
    return (a, s)
  guard a
  setParserState s
  return a

我们现在可以实现pPowerOfTwo：

pPowerOfTwo :: Stream s m Char => ParsecT s u m Integer
pPowerOfTwo = guardP guardPowerOfTwo natural <?> "power of two"
  where guardPowerOfTwo s = unless (s `elem` [2^i | i <- [1..20]]) . unexpected $ show s

Parsec：特定位置的错误消息

3 个答案: