Question

我试图使用Ruby中的Parslet库来解析一个简单的缩进敏感语法。

以下是我尝试解析的语法示例：

level0child0
level0child1
  level1child0
  level1child1
    level2child0
  level1child2

结果树看起来像这样：

[
  {
    :identifier => "level0child0",
    :children => []
  },
  {
    :identifier => "level0child1",
    :children => [
      {
        :identifier => "level1child0",
        :children => []
      },
      {
        :identifier => "level1child1",
        :children => [
          {
            :identifier => "level2child0",
            :children => []
          }
        ]
      },
      {
        :identifier => "level1child2",
        :children => []
      },
    ]
  }
]

我现在拥有的解析器可以解析嵌套级别0和1节点，但无法解析过去：

require 'parslet'

class IndentationSensitiveParser < Parslet::Parser

  rule(:indent) { str('  ') }
  rule(:newline) { str("\n") }
  rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }

  rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }

  rule(:document) { node.repeat }

  root :document

end

require 'ap'
require 'pp'

begin
  input = DATA.read

  puts '', '----- input ----------------------------------------------------------------------', ''
  ap input

  tree = IndentationSensitiveParser.new.parse(input)

  puts '', '----- tree -----------------------------------------------------------------------', ''
  ap tree

rescue IndentationSensitiveParser::ParseFailed => failure
  puts '', '----- error ----------------------------------------------------------------------', ''
  puts failure.cause.ascii_tree
end

__END__
user
  name
  age
recipe
  name
foo
bar

很明显，我需要一个动态计数器，它需要3个缩进节点匹配嵌套级别3上的标识符。

如何以这种方式使用Parslet实现缩进敏感的语法分析器？有可能吗？

Answer 1

有几种方法。

通过将每一行识别为缩进和标识符的集合来解析文档，然后应用转换以根据缩进的数量重建层次结构。
使用捕获来存储当前缩进并期望下一个节点包含该缩进以及更多以匹配作为子节点（我没有深入研究这种方法，因为下一个节点发生在我身上）
规则只是方法。所以你可以将'node'定义为一个方法，这意味着你可以传递参数！（如下）

这允许您根据node(depth)定义node(depth+1)。但是，这种方法的问题是node方法与字符串不匹配，它会生成解析器。所以递归调用永远不会完成。

这就是dynamic存在的原因。它会返回一个解析器，直到它尝试匹配它为止，它才会被解析，现在让你可以顺利递归。

请参阅以下代码：

require 'parslet'

class IndentationSensitiveParser < Parslet::Parser

  def indent(depth)
    str('  '*depth)
  end

  rule(:newline) { str("\n") }

  rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }

  def node(depth) 
    indent(depth) >> 
    identifier >> 
    newline.maybe >> 
    (dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
  end 

  rule(:document) { node(0).repeat }

  root :document
end

这是我最喜欢的解决方案。

Answer 2

我不喜欢通过整个语法编织缩进过程知识的想法。我宁愿只生成INDENT和DEDENT令牌，其他规则可以使用类似于匹配“{”和“}”字符。所以以下是我的解决方案。它是一个类IndentParser，任何解析器都可以扩展以生成nl，indent和decent个令牌。

require 'parslet'

# Atoms returned from a dynamic that aren't meant to match anything.
class AlwaysMatch < Parslet::Atoms::Base
  def try(source, context, consume_all)
    succ("")
  end
end
class NeverMatch < Parslet::Atoms::Base
  attr_accessor :msg
  def initialize(msg = "ignore")
    self.msg = msg
  end
  def try(source, context, consume_all)
    context.err(self, source, msg)
  end
end
class ErrorMatch < Parslet::Atoms::Base
  attr_accessor :msg
  def initialize(msg)
    self.msg = msg
  end
  def try(source, context, consume_all)
    context.err(self, source, msg)
  end
end

class IndentParser < Parslet::Parser

  ##
  # Indentation handling: when matching a newline we check the following indentation. If
  # that indicates an indent token or detent tokens (1+) then we stick these in a class
  # variable and the high-priority indent/dedent rules will match as long as these 
  # remain. The nl rule consumes the indentation itself.

  rule(:indent)  { dynamic {|s,c| 
    if @indent.nil?
      NeverMatch.new("Not an indent")
    else
      @indent = nil
      AlwaysMatch.new
    end
  }}
  rule(:dedent)  { dynamic {|s,c|
    if @dedents.nil? or @dedents.length == 0
      NeverMatch.new("Not a dedent")
    else
      @dedents.pop
      AlwaysMatch.new
    end
  }}

  def checkIndentation(source, ctx)
    # See if next line starts with indentation. If so, consume it and then process
    # whether it is an indent or some number of dedents.
    indent = ""
    while source.matches?(Regexp.new("[ \t]"))
      indent += source.consume(1).to_s #returns a Slice
    end

    if @indentStack.nil?
      @indentStack = [""]
    end

    currentInd = @indentStack[-1]
    return AlwaysMatch.new if currentInd == indent #no change, just match nl

    if indent.start_with?(currentInd)
      # Getting deeper
      @indentStack << indent
      @indent = indent #tells the indent rule to match one
      return AlwaysMatch.new
    else
      # Either some number of de-dents or an error

      # Find first match starting from back
      count = 0
      @indentStack.reverse.each do |level|
        break if indent == level #found it, 

        if level.start_with?(indent)
          # New indent is prefix, so we de-dented this level.
          count += 1
          next
        end

        # Not a match, not a valid prefix. So an error!
        return ErrorMatch.new("Mismatched indentation level")
      end

      @dedents = [] if @dedents.nil?
      count.times { @dedents << @indentStack.pop }
      return AlwaysMatch.new
    end
  end
  rule(:nl)         { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}

  rule(:unixnl)     { str("\n") }
  rule(:macnl)      { str("\r") }
  rule(:winnl)      { str("\r\n") }
  rule(:anynl)      { unixnl | macnl | winnl }

end

我确信可以改进很多，但这是我到目前为止所提出的。

使用示例：

class MyParser < IndentParser
  rule(:colon)      { str(':') >> space? }

  rule(:space)      { match(' \t').repeat(1) }
  rule(:space?)     { space.maybe }

  rule(:number)     { match['0-9'].repeat(1).as(:num) >> space? }
  rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }

  rule(:block)      { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
  rule(:stmt)       { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
  rule(:testblock)  { identifier.as(:name) >> block }

  rule(:prgm)       { testblock >> nl.repeat }
  root :prgm
end

在Ruby中使用Parslet的缩进敏感解析器？

2 个答案: