我试图使用Ruby中的Parslet库来解析一个简单的缩进敏感语法。
以下是我尝试解析的语法示例:
level0child0
level0child1
level1child0
level1child1
level2child0
level1child2
结果树看起来像这样:
[
{
:identifier => "level0child0",
:children => []
},
{
:identifier => "level0child1",
:children => [
{
:identifier => "level1child0",
:children => []
},
{
:identifier => "level1child1",
:children => [
{
:identifier => "level2child0",
:children => []
}
]
},
{
:identifier => "level1child2",
:children => []
},
]
}
]
我现在拥有的解析器可以解析嵌套级别0和1节点,但无法解析过去:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
rule(:indent) { str(' ') }
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat.as(:identifier) }
rule(:node) { identifier >> newline >> (indent >> identifier >> newline.maybe).repeat.as(:children) }
rule(:document) { node.repeat }
root :document
end
require 'ap'
require 'pp'
begin
input = DATA.read
puts '', '----- input ----------------------------------------------------------------------', ''
ap input
tree = IndentationSensitiveParser.new.parse(input)
puts '', '----- tree -----------------------------------------------------------------------', ''
ap tree
rescue IndentationSensitiveParser::ParseFailed => failure
puts '', '----- error ----------------------------------------------------------------------', ''
puts failure.cause.ascii_tree
end
__END__
user
name
age
recipe
name
foo
bar
很明显,我需要一个动态计数器,它需要3个缩进节点匹配嵌套级别3上的标识符。
如何以这种方式使用Parslet实现缩进敏感的语法分析器?有可能吗?
答案 0 :(得分:14)
有几种方法。
通过将每一行识别为缩进和标识符的集合来解析文档,然后应用转换以根据缩进的数量重建层次结构。
使用捕获来存储当前缩进并期望下一个节点包含该缩进以及更多以匹配作为子节点(我没有深入研究这种方法,因为下一个节点发生在我身上)
规则只是方法。所以你可以将'node'定义为一个方法,这意味着你可以传递参数! (如下)
这允许您根据node(depth)
定义node(depth+1)
。但是,这种方法的问题是node
方法与字符串不匹配,它会生成解析器。所以递归调用永远不会完成。
这就是dynamic
存在的原因。它会返回一个解析器,直到它尝试匹配它为止,它才会被解析,现在让你可以顺利递归。
请参阅以下代码:
require 'parslet'
class IndentationSensitiveParser < Parslet::Parser
def indent(depth)
str(' '*depth)
end
rule(:newline) { str("\n") }
rule(:identifier) { match['A-Za-z0-9'].repeat(1).as(:identifier) }
def node(depth)
indent(depth) >>
identifier >>
newline.maybe >>
(dynamic{|s,c| node(depth+1).repeat(0)}).as(:children)
end
rule(:document) { node(0).repeat }
root :document
end
这是我最喜欢的解决方案。
答案 1 :(得分:0)
我不喜欢通过整个语法编织缩进过程知识的想法。我宁愿只生成INDENT和DEDENT令牌,其他规则可以使用类似于匹配“{”和“}”字符。所以以下是我的解决方案。它是一个类IndentParser
,任何解析器都可以扩展以生成nl
,indent
和decent
个令牌。
require 'parslet'
# Atoms returned from a dynamic that aren't meant to match anything.
class AlwaysMatch < Parslet::Atoms::Base
def try(source, context, consume_all)
succ("")
end
end
class NeverMatch < Parslet::Atoms::Base
attr_accessor :msg
def initialize(msg = "ignore")
self.msg = msg
end
def try(source, context, consume_all)
context.err(self, source, msg)
end
end
class ErrorMatch < Parslet::Atoms::Base
attr_accessor :msg
def initialize(msg)
self.msg = msg
end
def try(source, context, consume_all)
context.err(self, source, msg)
end
end
class IndentParser < Parslet::Parser
##
# Indentation handling: when matching a newline we check the following indentation. If
# that indicates an indent token or detent tokens (1+) then we stick these in a class
# variable and the high-priority indent/dedent rules will match as long as these
# remain. The nl rule consumes the indentation itself.
rule(:indent) { dynamic {|s,c|
if @indent.nil?
NeverMatch.new("Not an indent")
else
@indent = nil
AlwaysMatch.new
end
}}
rule(:dedent) { dynamic {|s,c|
if @dedents.nil? or @dedents.length == 0
NeverMatch.new("Not a dedent")
else
@dedents.pop
AlwaysMatch.new
end
}}
def checkIndentation(source, ctx)
# See if next line starts with indentation. If so, consume it and then process
# whether it is an indent or some number of dedents.
indent = ""
while source.matches?(Regexp.new("[ \t]"))
indent += source.consume(1).to_s #returns a Slice
end
if @indentStack.nil?
@indentStack = [""]
end
currentInd = @indentStack[-1]
return AlwaysMatch.new if currentInd == indent #no change, just match nl
if indent.start_with?(currentInd)
# Getting deeper
@indentStack << indent
@indent = indent #tells the indent rule to match one
return AlwaysMatch.new
else
# Either some number of de-dents or an error
# Find first match starting from back
count = 0
@indentStack.reverse.each do |level|
break if indent == level #found it,
if level.start_with?(indent)
# New indent is prefix, so we de-dented this level.
count += 1
next
end
# Not a match, not a valid prefix. So an error!
return ErrorMatch.new("Mismatched indentation level")
end
@dedents = [] if @dedents.nil?
count.times { @dedents << @indentStack.pop }
return AlwaysMatch.new
end
end
rule(:nl) { anynl >> dynamic {|source, ctx| checkIndentation(source,ctx) }}
rule(:unixnl) { str("\n") }
rule(:macnl) { str("\r") }
rule(:winnl) { str("\r\n") }
rule(:anynl) { unixnl | macnl | winnl }
end
我确信可以改进很多,但这是我到目前为止所提出的。
使用示例:
class MyParser < IndentParser
rule(:colon) { str(':') >> space? }
rule(:space) { match(' \t').repeat(1) }
rule(:space?) { space.maybe }
rule(:number) { match['0-9'].repeat(1).as(:num) >> space? }
rule(:identifier) { match['a-zA-Z'] >> match["a-zA-Z0-9"].repeat(0) }
rule(:block) { colon >> nl >> indent >> stmt.repeat.as(:stmts) >> dedent }
rule(:stmt) { identifier.as(:id) >> nl | number.as(:num) >> nl | testblock }
rule(:testblock) { identifier.as(:name) >> block }
rule(:prgm) { testblock >> nl.repeat }
root :prgm
end