使用Parslet和Ruby捕获转义的多行语法

时间:2018-01-05 04:25:09

标签: ruby parslet

我想在Ruby中编写一个Parslet解析器,它可以理解一些简单的配置语法:

alpha = one
beta = two\
three
gamma = four

从解析器的角度来看,反斜杠会转义新行,因此在解析时beta的值为twothree。但是,配置文件中的反斜杠(即上面的文本是直接表示 - 它不是你放在Ruby字符串引号中的内容)。在Ruby中,它可以表示为"alpha = one\nbeta = two\\\nthree\ngamma = four"

我目前的尝试对单线设置没问题,但无法处理多线方法:

require "parslet"

class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) }
  rule(:value) do
    (match("[^\n]").repeat(1) >> match("[^\\\n]") >> str("\\\n")).repeat(0) >>
      match("[^\n]").repeat(0)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:setting) do
    term.as(:key) >> space.maybe >> str("=") >> space.maybe >>
      value.as(:value)
  end

  rule(:input) { setting.repeat >> space.maybe }
  root(:input)
end

我想知道这个问题是否与Parslet解析事物有关。我的值规则的第一部分是否会抓住尽可能多的字符而不关心后面部分的上下文?

2 个答案:

答案 0 :(得分:0)

您需要使用空格启动setting规则。

以下代码段对我有用。我添加了ppspace?以便更好地理解

require "parslet"
require 'pp'

class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) >> space? }
  rule(:value) do
    (match("[^\n]").repeat(1) >> match("[^\\\n]") >> str("\\\n")).repeat(0) >>
      match("[^\n]").repeat(0)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:space?)     { space.maybe }
  rule(:setting) do
    space? >> term.as(:key) >> space? >> str("=") >> space? >>
      value.as(:value)
  end

  rule(:input) { setting.repeat >> space.maybe }
  root(:input)
end

str = %{
alpha = one
beta = two\
three
gamma = four
}

begin
  pp SettingParser.new.parse(str, reporter: Parslet::ErrorReporter::Deepest.new)
rescue Parslet::ParseFailed => error
  puts error.parse_failure_cause.ascii_tree
end

输出

[{:key=>"alpha "@1, :value=>"one"@9},
 {:key=>"beta "@13, :value=>"twothree"@20},
 {:key=>"gamma "@29, :value=>"four"@37}]

答案 1 :(得分:0)

是的。 Parslet规则会急切地消耗掉,因此您需要首先匹配转义字符,然后仅在不匹配的情况下才消耗一个非转义字符。

require "parslet"
require "pp"


class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) }
  rule(:char) { str("\\\n") | match("[^\n]").as(:keep) }
  rule(:value) do
    char.repeat(1)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:setting) do
    term.as(:key) >> space.maybe >> str("=") >> space.maybe >>
      value.as(:value) >> str("\n")
  end

  rule(:input) { setting.repeat.as(:settings) >> space.maybe }
  root(:input)
end

s = SettingParser.new

tree =  s.parse("alpha = one\nbeta = two\\\nthree\ngamma = four\n")
pp tree

这将生成以下内容...

{:settings=>
  [{:key=>"alpha"@0,
    :value=>[{:keep=>"o"@8}, {:keep=>"n"@9}, {:keep=>"e"@10}]},
   {:key=>"beta"@12,
    :value=>
     [{:keep=>"t"@19},
      {:keep=>"w"@20},
      {:keep=>"o"@21},
      {:keep=>"t"@24},
      {:keep=>"h"@25},
      {:keep=>"r"@26},
      {:keep=>"e"@27},
      {:keep=>"e"@28}]},
   {:key=>"gamma"@30,
    :value=>
     [{:keep=>"f"@38}, {:keep=>"o"@39}, {:keep=>"u"@40}, {:keep=>"r"@41}]}]}

在这里,我要标记未转义的字符...以便稍后进行转换...但是您可以捕获包括它们在内的整个字符串,然后在后期处理中搜索/替换它们。

无论如何...现在,您可以通过转换将数据从树中拉出。

class SettingTransform < Parslet::Transform
    rule(:keep => simple(:c)) {c}
    rule({:key => simple(:k), :value => sequence(:v)}) { {k => v.join} } 
    rule(:settings => subtree(:s)) {s.each_with_object({}){|p,a| a[p.keys[0]] = p.values[0]}}
end

pp SettingTransform.new.apply(tree)
# => {"alpha"@0=>"one", "beta"@12=>"twothree", "gamma"@30=>"four"}

您可能需要添加一些“行尾”逻辑。目前,我假设您的配置以“ \ n”结尾。您可以使用“ any.absent”检测EOF(或者始终在末尾添加“ \ n”)