Question

我是Parsec的新手（以及一般的解析器），我写这个解析器时遇到了一些麻烦：

list = char '(' *> many (spaces *> some letter) <* spaces <* char ')'

这个想法是以这种格式解析列表（我正在处理s表达式）：

(firstElement secondElement thirdElement and so on)

我写了这段代码来测试它：

import Control.Applicative
import Text.ParserCombinators.Parsec hiding (many)

list = char '(' *> many (spaces *> some letter) <* spaces <* char ')'

test s = do
  putStrLn $ "Testing " ++ show s ++ ":"
  parseTest list s
  putStrLn ""

main = do
  test "()"
  test "(hello)"
  test "(hello world)"
  test "( hello world)"
  test "(hello world )"
  test "( )"

这是我得到的输出：

Testing "()":
[]

Testing "(hello)":
["hello"]

Testing "(hello world)":
["hello","world"]

Testing "( hello world)":
["hello","world"]

Testing "(hello world )":
parse error at (line 1, column 14):
unexpected ")"
expecting space or letter

Testing "( )":
parse error at (line 1, column 3):
unexpected ")"
expecting space or letter

如您所见，当列表的最后一个元素与结束)之间存在空格时，它会失败。我不明白为什么我在spaces之前放入的<* char ')'消耗了空格。我犯了什么愚蠢的错误？

Answer 1

问题是spaces的参数中的many消耗了最终空格，

list = char '(' *> many (spaces *> some letter) <* spaces <* char ')'
                     --  ^^^^^^ that one

然后解析器期望some letter但是找到一个右括号因此失败。

要解决此问题，请在标记之后仅使用空格

list = char '(' *> spaces *> many (some letter <* spaces) <* char ')'

按预期工作：

$ runghc lisplists.hs Testing "()": [] Testing "(hello)": ["hello"] Testing "(hello world)": ["hello","world"] Testing "( hello world)": ["hello","world"] Testing "(hello world )": ["hello","world"] Testing "( )": []

Answer 2

问题是，一旦解析器many (spaces *> some letter)看到一个空间，它就会自己解析另一个项目，因为默认情况下Parsec只向前看一个字符并且没有回溯。

大锤解决方案是使用try来启用回溯，但最好通过在每个令牌之后解析可选的空格来避免这样的问题，如Daniel's answer中所示

Answer 3

这有点棘手。解析器默认是贪心的。在你的情况下它意味着什么？当您尝试解析(hello world )时，从解析(开始，然后您尝试匹配一些空格和标识符。所以我们这样做。没有空格，但有标识符。我们完了。我们再次尝试世界。现在我们剩下_)了。您尝试解析器(spaces *> some letter)。它让它变得贪婪：所以你匹配空间，现在你想要一些字母，但你会得到)。此时解析器失败了，但它已经消耗了空间，所以你注定要失败。您可以使用try组合器使此解析器进行回溯：try (many (spaces *> some letter))

难以让Parsec解析器正确跳过空格

3 个答案: