如何使用FParsec解析标记列表

时间:2014-05-16 23:28:51

标签: f# fparsec

我试图用FParsec解析一个令牌列表,其中每个令牌都是一个文本块或一个标签 - 例如:

  

这是{类型的测试}测试,它{成功或失败}

这是解析器:

type Parser<'t> = Parser<'t, unit>

type Token =
| Text of string
| Tag of string

let escape fromString toString : Parser<_> =
    pstring fromString |>> (fun c -> toString)

let content : Parser<_> =
    let contentNormal = many1Satisfy (fun c -> c <> '{' && c <> '}')
    let openBraceEscaped = escape "{{" "{"
    let closeBraceEscaped = escape "}}" "}"
    let contentEscaped = openBraceEscaped <|> closeBraceEscaped
    stringsSepBy contentNormal contentEscaped

let ident : Parser<_> =
    let isIdentifierFirstChar c = isLetter c || c = '_'
    let isIdentifierChar c = isLetter c || isDigit c || c = '_'
    spaces >>. many1Satisfy2L isIdentifierFirstChar isIdentifierChar "identifier" .>> spaces

let text = content |>> Text

let tag = 
    ident |> between (skipString "{") (skipString "}")
    |>> Tag

let token = text <|> tag
let tokens = many token .>>. eof   

以下测试工作:

> run token "abc def" ;;
val it : ParserResult<Token,unit> = Success: Text "abc def"

> run token "{abc def}" ;;
val it : ParserResult<Token,unit> = Success: Tag "abc def"

但尝试运行令牌会导致异常:

> run tokens "{abc} def" ;;
System.InvalidOperationException: (Ln: 1, Col: 10): The combinator 'many' was 
    applied to a parser that succeeds without consuming input and without
    changing the parser state in any other way. (If no exception had been raised,
    the combinator likely would have entered an infinite loop.)

我已经过了this stackoverflow question,但我没有尝试过任何作品。我甚至添加了以下内容,但我得到了同样的例外:

let tokenFwd, tokenRef = createParserForwardedToRef<Token, unit>()
do tokenRef := choice [tag; text]
let readEndOfInput : Parser<unit, unit> = spaces >>. eof
let readExprs = many tokenFwd
let readExprsTillEnd = readExprs .>> readEndOfInput

run readExprsTillEnd "{abc} def"  // System.InvalidOperationException ... The combinator 'many' was applied  ...

我认为问题是内容中的stringsSepBy,但我无法找出任何其他方法来获取带有转义项目的字符串

任何帮助都会非常感激 - 我已经经历了这几天了,并且无法弄明白。

2 个答案:

答案 0 :(得分:2)

stringsSepBy 接受零字符串,导致令牌接受空字符串,导致许多抱怨。

我将其更改为以下内容,以验证这是您需要处理的行。

many1 (contentNormal <|> contentEscaped) |>> fun l -> String.concat "" l

此外,我还远离 stringsSepBy contentNormal contentEscaped ,因为这表示您需要将 contentNormals contentEscapeds 相匹配。所以{{b}} c没问题,但{{b}},{{b}} c和{{b}}都会失败。

答案 1 :(得分:1)

notEmpty可用于消费输入。如果你没有消耗任何输入但是让解析器成功,那么解析器的“当前位置”不会向前移动,所以当一个语句在many内执行时,它将进入一个无限循环而没有例外。 stringsSepBy正在成功解析零元素,如果它获得零元素,您可以使用notEmpty使其失败:

stringsSepBy contentNormal contentEscaped |> notEmpty

另外,我试图让您的完整示例进行解析,标记可以包含空格,因此您需要允许ident包含与之匹配的空格:

let isIdentifierChar c = isLetter c || isDigit c || c = '_' || c = ' '

另一个小调整只是返回Token list而不是Token list * unit元组(uniteof的结果):

let tokens = many token .>> eof