Question

我正在尝试将字符串分隔为","，", and"和"and"，然后返回其间的内容。我到目前为止的一个例子如下：

import Data.Attoparsec.Text

sepTestParser = nameSep ((takeWhile1 $ inClass "-'a-zA-Z") <* space)
nameSep p = p `sepBy` (string " and " <|> string ", and" <|> ", ")

main = do
  print $ parseOnly sepTestParser "This test and that test, this test particularly."

我希望输出为["This test", "that test", "this test particularly."]。我有一种模糊的感觉，我正在做的事情是错的，但我无法理解为什么。

Answer 1

^{注意：这个答案写在literate Haskell中。将其保存为Example.lhs并将其加载到GHCi或类似文件中。}

问题是，sepBy实现为：

sepBy p s = liftA2 (:) p ((s *> sepBy1 p s) <|> pure []) <|> pure []

这意味着在第一个解析器成功后，第二个解析器s将被称为。这也意味着，如果你要在字符类中添加空格，那么你最终会得到

["This test and that test","this test particularly"]

因为and现在可由p解析。这并不容易解决：你需要在你到达一个空格时立即向前看，并检查是否在任意数量的空格之后＆＃34;和＆＃34;如下，如果是，则停止解析。只有然后用sepBy编写的解析器才有效。

所以让我们编写一个解析器来取代单词（这个答案的其余部分是有文化的Haskell）：

> {-# LANGUAGE OverloadedStrings #-} > import Control.Applicative > import Data.Attoparsec.Text > import qualified Data.Text as T > import Control.Monad (mzero) > word = takeWhile1 . inClass $ "-'a-zA-Z" > > wordsP = fmap (T.intercalate " ") $ k `sepBy` many space > where k = do > a <- word > if (a == "and") then mzero > else return a

wordsP现在需要多个单词，直到它要么点击某个词，不是单词，或者等于＆＃34;和＆＃34;的单词。返回的mzero将指示parsing failure，另一个解析器可以接管该文件：

> andP = many space *> "and" *> many1 space *> pure() > > limiter = choice [ > "," *> andP, > "," *> many1 space *> pure (), > andP > ]

limiter与您已编写的解析器大致相同，与正则表达式/,\s+and|,\s+|\s*and\s+/相同。

现在我们实际上可以使用sepBy，因为我们的第一个解析器不再与第二个解析器重叠：

> test = "This test and that test, this test particular, and even that test" > > main = print $ parseOnly (wordsP `sepBy` limiter) test

结果是["This test","that test","this test particular","even that test"]，正如我们想要的那样。请注意，此特定解析器不会保留空格。

因此，每当您想要使用sepBy创建解析器时，请确保两个解析器不会重叠。

在Attoparsec中使用sepBy字符串

1 个答案: