我正在研究Ruby中的字符串重建算法(动态编程示例中的经典,将空间少的文本转换为正常间隔文本)。 以下代码是纯红宝石,您可以复制粘贴并立即开始测试,它在80%的时间内工作并且往往会中断,字典变得越大。我用超过80k字的词典对它进行了测试,它的效果不太好,大约70%的时间都是如此。
如果有一种方法可以让它在字典中出现100%的效果,请告诉我。
以下是代码:(它间距很大,应该非常易读)
# Partially working string reconstruction algo in pure Ruby
# the dictionary
def dict(someWord)
myArray = [" ", "best", "domain", "my", "successes", "image", "resizer", "high", "tech", "crime", "unit", "name", "edge", "times", "find", "a", "bargain", "free", "spirited", "style", "i", "command", "go", "direct", "to", "harness", "the", "force"]
return !!(myArray.index(someWord))
end
# inspired by http://cseweb.ucsd.edu/classes/wi12/cse202-a/lecture6-final.pdf
## Please uncomment the one you wanna use
#
# (all the words used are present in the dictionary above)
#
# working sentences
x = ' ' + "harnesstheforce"
# x = ' ' + "hightechcrimeunit"
#
# non working sentences
# x = ' ' + "findabargain"
# x = ' ' + "icommand"
puts "Trying to reconstruct #{x}"
# useful variables we're going to use in our algo
n = x.length
k = Array.new(n)
s = Array.new(n)
breakpoints = Hash.new
validBreakpoints = Hash.new
begin
# let's fill k
for i in 0..n-1
k[i] = i
end
# the core algo starts here
s[0] = true
for k in 1..n-1
s[k] = false
for j in 1..k
if s[j-1] && dict(x[j..k])
s[k] = true
# using a hash is just a trick to not have duplicates
breakpoints.store(k, true)
end
end
end
# debug
puts "breakpoints: #{breakpoints.inspect} for #{x}"
# let's create a valid break point vector
i=1
while i <= n-1 do
# we choose the longest valid word
breakpoints.keys.sort.each do |k|
if i >= k
next
end
# debug: when the algo breaks, it does so here and goes into an infinite loop
#puts "x[#{i}..#{k}]: #{x[i..k]}"
if dict(x[i..k])
validBreakpoints[i] = k
end
end
if validBreakpoints[i]
i = validBreakpoints[i] + 1
end
end
# debug
puts "validBreakpoints: #{validBreakpoints.inspect} for #{x}"
# we insert the spaces at the places defined by the valid breakpoints
x = x.strip
i = 0
validBreakpoints.each_key do |key|
validBreakpoints[key] = validBreakpoints[key] + i
i += 1
end
validBreakpoints.each_value do |value|
x.insert(value, ' ')
end
puts "Debug: x: #{x}"
# we capture ctrl-c
rescue SignalException
abort
# end of rescue
end
答案 0 :(得分:1)
请注意,对于包含单字符字符的字符串,您的算法会失败。这是一个一个错误的错误。在这些单词之后你忽略了断点,因此你最终得到了一个单词("abargain"
),而你的词典中没有这个单词。
更改
if i >= k
next
end
到
if i > k
next
end
或更多类似Ruby的
next if i > k
另请注意,只要字符串包含的内容不是单词,您就会遇到无限循环:
if validBreakpoints[i] # will be false
i = validBreakpoints[i] + 1 # i not incremented, so start over at the same position
end
您最好将此视为错误
return '<no parse>' unless validBreakpoints[i] # or throw if you are not in a function
i = validBreakpoints[i] + 1
"inotifier"
的问题是您的算法存在缺陷。总是选择最长的单词并不好。在这种情况下,检测到的第一个“有效”断点位于"in"
之后,为您留下非单词"otifier"
。