我的代码在这里
str = "Early in his first term in office, Obama signed into law economic stimulus legislation in response"
arr= str.split(" ")
set_element= arr.each_cons(2).to_a
sub_str = set_element.map {|i| i.join(' ')}
如果我有一个很大的字符串,那么这个过程需要6.50秒 因为我想要这种类型的结果
sub_str= ["Early in", "in his", "his first", "first term", "term in", "in office,", "office, Obama", "Obama signed", "signed into", "into law", "law economic", "economic stimulus", "stimulus legislation", "legislation in", "in response"]
是否有可能采用其他有效方式
答案 0 :(得分:7)
使用scan而不是split,您可以直接获取单词对。
s.scan(/\S+(?:\s+\S+)?/)
编辑:只是为了向自己保证这是相对有效的,我做了a little micro-benchmark。以下是迄今为止所见答案的结果:
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux]
10 times on string of size 2284879
user system total real
original 4.180000 0.070000 4.250000 ( 4.272856)
sergio 2.090000 0.000000 2.090000 ( 2.102469)
dbenhur 1.050000 0.000000 1.050000 ( 1.042167)
答案 1 :(得分:1)
set_element = arr.each_cons(2).to_a
上面的行会创建大量您不需要的临时对象。试试这个,应该更快:
str = "Early in his first term in office, Obama signed into law economic stimulus legislation in response"
arr = str.split(" ")
sub_str = arr.each_with_object([]).with_index do |(el, memo), idx|
if idx % 2 == 0
memo << el
else
memo.last << ' ' << el
end
end
sub_str # => ["Early in", "his first", "term in", "office, Obama", "signed into", "law economic", "stimulus legislation", "in response"]
答案 2 :(得分:0)
你可以试试这个。少一步:)
arr= str.scan(/\S+/)
s = []
arr.each_with_index { |x, i| s << (x + " " + arr[i + 1]) if arr[i+1] }