我有一个字符串(来自肥皂的一块cdata)看起来大致如下:
"<![CDATA[XXX|^~\&
KEY|^~\&|xxxxx|xxxxx^xxxx xxxxx
INFO||xxx|xxxxxx||xxxxx|xxxxxxx|xxxxxxx
INFO|||xxxxx||||xxxxxxxxx||||||||||xxxxxxxx
KEY|^~\&|xxxxxx|xxxxxxxxxx|xxxxxxxx
INFO||xx|xxxxxxxx||xxxxxxx|xxxxxx
INFO|||xxxx|x|||xxxxxxxxx|||||||x|||xxxxx|||xxxx||||||||||||||||||||||||xxxx
KEY|^~\&|xxxxx|xxxxx^xxxx xxxxx
INFO||xxx|xxxxxx||xxxxx|xxxxxxx|xxxxxxx
INFO|||xxxxx||||xxxxxxxxx||||||||||xxxxxxxx ]]>"
我想知道如何使用ruby安全地解析每个'KEY'部分的字符串。基本上我需要一个看起来像的刺痛:
"KEY|^~\&|xxxxx|xxxxx^xxxx xxxxx
INFO||xxx|xxxxxx||xxxxx|xxxxxxx|xxxxxxx
INFO|||xxxxx||||xxxxxxxxx||||||||||xxxxxxxx"
每次都有'KEY'。关于最佳方式的想法?谢谢。
答案 0 :(得分:0)
这是一种方法(使用简化示例):
str =
"<![CDATA[XXX|^~\&
KEY|^~\&|x
INFO||x
INFO|||x
KEY|^~\&|x
INFO||xx|x
INFO|||x
KEY|^~\&|x
INFO||x
INFO|||x"
r = /
^KEY\b # match KEY at beginning of line followed by word boundary
.+? # match any number of any character, lazily
(?=\bKEY\b|\z) # match KEY bracketed by word boundaries or end of
# string, in positive lookahead
/mx # multiline and extended modes
str.scan r
#=> ["KEY|^~&|x\nINFO||x\nINFO|||x\n",
# "KEY|^~&|x\nINFO||xx|x\nINFO|||x\n",
# "KEY|^~&|x\nINFO||x\nINFO|||x"]
答案 1 :(得分:0)
不像正则表达式那样放松,但这可能适合你:
KEY(.+\n)+(?=\s+KEY)