Question

使用 fslex 我想为一个模式返回多个令牌，但我没有看到如何实现这一目标的方法。即使使用另一个返回多个令牌的规则函数也适用于我。

我想尝试使用这样的东西：

let identifier = [ 'a'-'z' 'A'-'Z' ]+

// ...

rule tokenize = parse
// ...
| '.' identifier '(' { let value = lexeme lexbuf
                       match operations.TryFind(value) with
                      // TODO: here is the problem:
                      // I would like to return like [DOT; op; LPAREN]
                      | Some op -> op
                      | None    -> ID(value) }

| identifier         { ID (lexeme lexbuf) }
// ...

我在这里要解决的问题是，只有当operations介于identifier和.之间时才匹配预定义的令牌（请参阅：(地图）。否则，匹配应作为ID返回。

我对fslex相当新，所以我很高兴能找到正确的方向。

Answer 1

尝试保持语义分析，例如“......只有当标识符位于。和（”你的词法分析器（fslex）之外）时，而是将它保存到你的解析器（fsyacc）。即一个选项是保持你的lexer对operations无知：

let identifier = [ 'a'-'z' 'A'-'Z' ]+    
// ...
rule tokenize = parse
// ...
| '.' { DOT }
| '(' { LPAREN }
| identifier { ID (lexeme lexbuf) }
// ...

然后在fsyacc中用以下规则解决问题：

| DOT ID LPAREN { match operations.TryFind($2) with
                  | Some op -> Ast.Op(op)
                  | None    -> Ast.Id($2) }

更新以回应评论：

也许以下是你的词法分子：

let identifier = [ 'a'-'z' 'A'-'Z' ]+   
let operations =
  [
    "op1", OP1
    "op2", OP2
    //...
  ] |> Map.ofList 

// ...
rule tokenize = parse
// ...
| '.' { DOT }
| '(' { LPAREN }
| identifier 
  { 
    let input = lexeme lexbuf
    match keywords |> Map.tryFind input with
    | Some(token) -> token
    | None -> ID(input) 
  }
// ...

并在您的解析器中：

| DOT ID LPAREN { ... }
| DOT OP1 LPAREN { ... }
| DOT OP2 LPAREN { ... }

因此，您已强制执行ID和operation必须介于解析器中的DOT和LPAREN之间的规则，同时保持您的词法分析器的简单性be（提供令牌的流，几乎没有强制执行令牌相对于彼此的有效性。）

Answer 2

好的，这是。

每个词法分析器规则（即rule <name> = parse .. cases ..）定义了一个函数<name> : LexBuffer<char> -> 'a，其中'a可以任何类型。通常，您返回令牌（可能由FsYacc为您定义），因此您可以解析这样的文本：

let parse text =
    let lexbuf = LexBuffer<char>.FromString text
    Parser.start Lexer.tokenize lexbuf

Parser.start是解析函数（来自您的FsYacc文件），(LexBuffer<char> -> Token) -> LexBuffer<char> -> AST类型（Token和AST是您的类型，对它们没什么特别的。）< / p>

在你的情况下，你需要<name> : LexBuffer<char> -> 'a list，所以你要做的就是：

let parse' text =
    let lexbuf = LexBuffer<char>.FromString text
    let tokenize =
        let stack = ref []
        fun lexbuf ->
        while List.isEmpty !stack do
            stack := Lexer.tokenize lexbuf
        let (token :: stack') = !stack // can never get match failure,
                                        // else the while wouldn't have exited
        stack := stack'
        token
    Parser.start tokenize lexbuf

这只是保存你的词法分析器提供的标记，并逐个将它们提供给解析器（并根据需要生成更多标记）。

Answer 3

（这是一个单独的答案）

对于这种特定情况，这可能会更好地解决您的问题：

...

rule tokenize = parse
...
| '.' { DOT }
| '(' { LPAREN }
| identifier { ID (lexeme lexbuf) }

...

用法：

let parse'' text =
    let lexbuf = LexBuffer<char>.FromString text
    let rec tokenize =
        let stack = ref []
        fun lexbuf ->
        if List.isEmpty !stack then
            stack := [Lexer.tokenize lexbuf]
        let (token :: stack') = !stack // can never get match failure,
                                        // else the while wouldn't have exited
        stack := stack'
        // this match fixes the ID to an OP, if necessary
        // multiple matches (and not a unified large one),
              // else EOF may cause issues - this is quite important
        match token with
        | DOT ->
          match tokenize lexbuf with
          | ID id ->
            match tokenize lexbuf with
            | LPAREN ->
              let op = findOp id
              stack := op :: LPAREN :: !stack
            | t -> stack := ID id :: t :: !stack
          | t -> stack := t :: !stack
        | _ -> ()
        token
    Parser.start tokenize lexbuf

如果它们被DOT和LPAREN包围，那么这将使ID成为操作。只有这样。

PS：我有3个单独的匹配，因为统一匹配需要使用Lazy<_>值（这会使其更不易读），或者在[DOT; EOF]序列上失败，因为它期待额外的第三个令牌。

如何为一个fslex规则模式返回多个令牌？

3 个答案: