Question

如何使用parsec解析字符串中所有匹配的输入并丢弃其余的？

示例：我有一个简单的数字解析器，如果我知道它们之间的区别，我可以找到所有数字：

num :: Parser Int
num = read <$> many digit

parse (num `sepBy` space) "" "111 4 22"

但是，如果我不知道数字之间的区别怎么办？

"I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."

many anyChar不能用作分隔符，因为它会占用所有内容。

那么我怎么能得到与我想忽略的东西包围的任意解析器相匹配的东西呢？

编辑：请注意，在实际问题中，我的解析器更复杂：

optionTag :: Parser Fragment
optionTag = do
    string "<option"
    manyTill anyChar (string "value=")
    n <- many1 digit
    manyTill anyChar (char '>')
    chapterPrefix
    text <- many1 (noneOf "<>")
    return $ Option (read n) text
  where
    chapterPrefix = many digit >> char '.' >> many space

Answer 1

对于任意解析器myParser，它非常简单：

solution = many (let one = myParser <|> (anyChar >> one) in one)

用这种方式写它可能更清楚：

solution = many loop
    where 
        loop = myParser <|> (anyChar >> loop)

基本上，这定义了一个递归解析器（称为loop），它将继续搜索myParser可以解析的第一个东西。 many只会彻底搜索到失败，即：EOF。

Answer 2

您可以使用

 many ( noneOf "0123456789")

我不确定＆＃34; noneOf＆＃34;和＆＃34;数字＆＃34;类型，但你可以尝试

many $ noneOf digit

Answer 3

要查找字符串中的项目，该项目位于字符串的开头，或者使用一个字符并在now-short字符串中查找该项目。如果该项目不在字符串的开头，则您需要在查找时取消使用所使用的字符，因此您需要try块。

hasItem = prefixItem <* (many anyChar)
preafixItem = (try item) <|> (anyChar >> prefixItem)
item = <parser for your item here>

此代码在字符串中仅查找item的一次出现。

（AJFarmar几乎拥有它。）

Answer 4

通过replace-megaparsec包，您可以使用sepCap解析器组合器将字符串分成与模式匹配的部分和与模式不匹配的部分。

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char

let num :: Parsec Void String Int
    num = read <$> many digitChar

>>> parseTest (sepCap num) "I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."
[Left "I will live to be "
,Right 111
,Left " years <b>old</b> if I work out "
,Right 4
,Left " days a week starting at "
,Right 22
,Left "."
]

Parsec如何找到＆＃34;匹配＆＃34;在一个字符串中

4 个答案: