我有一个这样的字符串:
group1
Members:
m/a
m/b
group2
Members:
m/c
m/d
m/e
group3
No Members
我想要扫描结果如下:
[["group1","a","b"],["group2","c","d","e"],["group3"]]
但我可以:
[["group1","a"],["group2","c"],["group3", nil]]
这个正则表达式:
text.scan(/([^\r\n]+)\r?\n[\s\t]*(?:No |)Members[\s:]*\r?\n(?:[\t\s]*m\/(\w+)+\r?\n)*/m)
我能用regexp做我想要的吗?
答案 0 :(得分:0)
如果你想存储这样的分层数据,你可能最好使用YAML,而不是试图用正则表达式解析字符串。
groups.yml:
group1:
members:
- m/a
- m/b
group2:
members:
- m/c
- m/d
- m/e
group3:
members: []
解析这些数据:
> YAML.load(File.open('./groups.yml'))
=> {"group1"=>{"members"=>["m/a", "m/b"]}, "group2"=>{"members"=>["m/c", "m/d", "m/e"]}, "group3"=>{"members"=>[]}}
答案 1 :(得分:0)
虽然可以在正则表达式中进行,但它变得笨拙,所以我会这样做:
data = <<EOT
group1
Members:
m/a
m/b
group2
Members:
m/c
m/d
m/e
group3
No Members
EOT
pp data.lines.slice_before(/^group/).to_a
=> [["group1\n", " Members: \n", " m/a\n", " m/b\n"],
["group2\n", " Members: \n", " m/c\n", " m/d\n", " m/e\n"],
["group3\n", " No Members\n"]]
清理其余部分以满足问题的要求:
data.gsub(%r{\bm/}, '').split(/\n\s*/).reject{ |s| s[/\bMembers\b/] }.slice_before(/^group/).to_a
=> [["group1", "a", "b"], ["group2", "c", "d", "e"], ["group3"]]
解析的要点实际上在slice_before
。其他一切都在创建阵列和清理。
打破它:
gsub(%r{\bm/}, '')
删除了不受欢迎的m/
。split(/\n\s*/)
将线端的字符串拆分为数组,同时删除前导空格。reject{ |s| s[/\bMembers\b/] }
拒绝任何包含“成员”的行作为单独的单词。slice_before(/^group/)
将数组拆分为以字符串开头的“group”开头的块。to_a
将其全部转换为数组。