我试图用Ruby解析键值表达式,格式如下:
foo-bar:bar foo:"bar 1" "bar 2":"foo 3" "bar 2 \"var\"":"foo 3"
应该屈服:
Key: foo-bar Value: bar
Key: foo Value: bar 1
Key: bar 2 Value: foo 3
Key: bar 2 "var" Value: foo 3
这是否可以使用正则表达式?键和值可以是不带引号的字符串 没有空格或带有空格的带引号的字符串。
我有以下内容:
([a-zA-Z0-9\-]+|\"[a-zA-Z0-9\-\s]+)\"\s*\:\s*([a-zA-Z0-9\-]+|\"[a-zA-Z0-9\-\s]+\")
答案 0 :(得分:3)
这可以解决您的大多数问题:
("(?:\\.|[^"])*"|[^\s]*):\s*("(?:\\.|[^"])*"|[^\s]*)
更精细的选项是:
(?:"((?:\\.|[^"])*)"|([^\s]*)):\s*(?:"((?:\\.|[^"])*)"|([^\s]*))
哪个会在没有引号的情况下捕获,在ruby中它将如下所示:
string = 'foo-bar:bar foo:"bar 1" "bar : 2":"foo \" 3" "bar 2 \"var\"":"foo 3"'
string.scan(/(?:"((?:\\.|[^"])*)"|([^\s]*)):\s*(?:"((?:\\.|[^"])*)"|([^\s]*))/).map(&:compact)
# => [["foo-bar", "bar"], ["foo", "bar 1"], ["bar : 2", "foo \\\" 3"], ["bar 2 \\\"var\\\"", "foo 3"]]
答案 1 :(得分:0)
答案 2 :(得分:0)
s = 'foo-bar:bar foo:"bar 1" "bar 2":"foo 3" "bar 2 \"var\"":"foo 3"'
s.scan(/(?<!\\)"((?:[^"]|\\")*)(?<!\\)"|([^\s:]+)/).flatten.compact
.each_slice(2).to_h
# =>
# {
# "foo-bar" => "bar",
# "foo" => "bar 1",
# "bar 2" => "foo 3",
# "bar 2 \\\"var\\\"" => "foo 3"
# }
答案 3 :(得分:0)
如果您可以使用非正则表达式解决方案,那么这个非常简单的解析器应该可以完成这项工作:
def kv(s)
state = :unquoted
result = [""]
s.chars do |c|
if state == :unquoted
case c
when ':', ' '
if result.last.length > 0
# next
result << ""
end
when '"'
state = :quoted
else
# write
result.last << c
end
elsif state == :quoted
case c
when '"'
# next
result << ""
state = :unquoted
when '\\'
state = :escaped
else
# write
result.last << c
end
elsif state == :escaped
#write
result.last << c
state = :quoted
end
end
result.pop
Hash[*result]
end
测试:
s = 'foo-bar:bar foo:"bar 1" "bar 2":"foo 3" "bar 2 \"var\"":"foo 3"'
kv s # => "foo-bar"=>"bar", "foo"=>"bar 1", "bar 2"=>"foo 3", "bar 2 \"var\""=>"foo 3"}