嘿,我有一个很大的文字,就像这样
some_data I POST postdata_1 IV POST postdata_4 III POST postdata_3 II POST postdata_2
因此,在“POST”之前,帖子数据在Roman Numeral中有相应的帖子编号。
我希望将其作为
标记 <post number>
I
</post number>
<post data>
post_data1
</post data>
等等每个帖子......
有人可以用正则表达式帮我解决这个问题吗?我正在使用Ruby
答案 0 :(得分:1)
如果我理解得很好,这将按照您的期望工作:
roman_number = /M{0,3}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})/
regex = /(#{roman_number})\sPOST\s(.+?)(?=\s#{roman_number}\sPOST|$)/
str.scan(regex) do |post_number, post_data|
...
end
罗马数字正则表达式由paxdiablo,here。
答案 1 :(得分:0)
这不是一个漂亮的漂亮正则表达式,但如果我理解你的输入和要求......呃“some_data”应该被忽略吧?
strin = "some_data I POST postdata_1 IV POST postdata_4 III POST postdata_3 II POST postdata_2"
while curmatch = strin.match(/ (.*?) POST (.*?) POST/) do
postnum = curmatch[1]
postdata = curmatch[2].reverse.sub(/.*? /, '').reverse
puts "<post number>#{postnum}</post number>"
puts "<post data>#{postdata}</post data>"
strin.sub!(" #{postnum} POST #{postdata}", '')
end
curmatch = strin.match(/ (.*?) POST (.*)/)
puts "<post number>#{curmatch[1]}</post number>"
puts "<post data>#{curmatch[2]}</post data>"
输出:
<post number>I</post number>
<post data>postdata_1</post data>
<post number>IV</post number>
<post data>postdata_4</post data>
<post number>III</post number>
<post data>postdata_3</post data>
<post number>II</post number>
<post data>postdata_2</post data>
答案 2 :(得分:0)
d = 'some_data I POST postdata_1 IV POST postdata_4' +
' III POST postdata_3 II POST postdata_2'
def fd
puts "<post data>\n #{
@pd[0..-2].to_a.join ' '
}\n</post data>\n" if @pd.to_a.length > 1
end
@pd = []
d.split.each_cons(2) do |n, p|
if p == 'POST'
fd
puts "<post number>\n #{n}\n</post number>\n"
@pd = []
else
@pd << p
end
end
@pd << ''
fd