我正在使用Ruby 1.9,我想知道是否有一种简单的正则表达方式来做到这一点。
我有许多字符串看起来像这样的一些变体:
str = "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
我的想法是,我想将此字符串分解为其功能组件
字符串的“语法”是存在“key”,其由一个或多个“单词或其他字符”(例如干预模型)组成,后跟冒号(:)。每个键都有一个对应的“value”(例如,并行赋值)紧跟在冒号(:)之后......“值”由单词,逗号(无论如何)组成,但是“值”的结尾“用逗号表示。
键/值对的数量是可变的。 我还假设冒号(:)不允许成为“值”的一部分,该逗号(,)不允许成为“密钥”的一部分。
有人会认为有一种“regexy”方法可以将其分解为组件,但我尝试制作一个合适的匹配正则表达式只会选择第一个键/值对,而我不知道如何捕获其他。关于如何捕捉其他比赛的任何想法?
regex = /(([^,]+?): ([^:]+?,))+?/
=> /(([^,]+?): ([^:]+?,))+?/
irb(main):139:0> str = "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
=> "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment"
irb(main):140:0> str.match regex
=> #<MatchData "Allocation: Random," 1:"Allocation: Random," 2:"Allocation" 3:" Random,">
irb(main):141:0> $1
=> "Allocation: Random,"
irb(main):142:0> $2
=> "Allocation"
irb(main):143:0> $3
=> " Random,"
irb(main):144:0> $4
=> nil
答案 0 :(得分:6)
irb(main):003:0> pp Hash[ *str.split(/\s*([^,]+:)\s+/)[1..-1] ]
{"Allocation:"=>"Random,",
"Control:"=>"Active Control,",
"Endpoint Classification:"=>"Safety Study,",
"Intervention Model:"=>"Parallel Assignment,",
"Masking:"=>
"Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor),",
"Primary Purpose:"=>"Treatment"}
不需要正则表达式的空白部分,但有助于稍微清理输出。我留给你做后续的小清理,比如从键的末尾删除冒号或从值中删除逗号。
答案 1 :(得分:2)
经过一些试验和错误后,我设法让以下内容处理您的示例字符串和正则表达式:
str.split(/((?:[^,]+?): (?:[^:]+?,(?![^\(]+?\))))+?/).delete_if(&:empty?).map{|s| s.strip.chomp(',')}
我必须添加一个预测,以确保忽略任何括号内的逗号,以及将某些组静音。最后的delete_if
和map
纯粹是装饰性的。