Ruby在字符串中搜索单词

时间:2014-09-18 10:36:51

标签: ruby string

给定input = "helloworld"

输出应为output = ["hello", "world"]

鉴于我有一个名为is_in_dict?的方法,如果有一个单词

,则返回true

到目前为止我试过了:

ar = []
input.split("").each do |f|
  ar << f if is_in_dict? f
  // here need to check given char
end

如何在Ruby中实现它?

2 个答案:

答案 0 :(得分:1)

您必须检查所有组合,而不是将输入拆分为字符,即"h""he""hel",... "helloworld",{{1} },"e""el",... "ell"等等。

这样的事情应该有效:

"elloworld"

或者,使用返回数组的each_with_object

(0..input.size).to_a.combination(2).each do |a, b|
  word = input[a...b]
  ar << word if is_in_dict?(word)
end
#=> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
ar
#=> ["hello", "world"]

另一种方法是构建自定义Enumerator

(0..input.size).to_a.combination(2).each_with_object([]) do |(a, b), array|
  word = input[a...b]
  array << word if is_in_dict?(word)
end
#=> ["hello", "world"]

class String def each_combination return to_enum(:each_combination) unless block_given? (0..size).to_a.combination(2).each do |a, b| yield self[a...b] end end end 产生所有组合(而不仅仅是索引):

String#each_combination

可与select一起使用,轻松过滤特定字词:

input.each_combination.to_a
#=> ["h", "he", "hel", "hell", "hello", "hellow", "hellowo", "hellowor", "helloworl", "helloworld", "e", "el", "ell", "ello", "ellow", "ellowo", "ellowor", "elloworl", "elloworld", "l", "ll", "llo", "llow", "llowo", "llowor", "lloworl", "lloworld", "l", "lo", "low", "lowo", "lowor", "loworl", "loworld", "o", "ow", "owo", "owor", "oworl", "oworld", "w", "wo", "wor", "worl", "world", "o", "or", "orl", "orld", "r", "rl", "rld", "l", "ld", "d"]

答案 1 :(得分:1)

这似乎是递归的任务。总之,你想逐个拿字母,直到你得到一个字典中的单词。然而,这并不能保证结果是正确的,因为其余的字母可能不会形成一个单词('hell'+'oworld'?)。这就是我要做的事情:

def split_words(string)
  return [[]] if string == ''
  chars = string.chars
  word = ''
  (1..string.length).map do 
    word += chars.shift
    next unless is_in_dict?(word)
    other_splits = split_words(chars.join)
    next if other_splits.empty?
    other_splits.map {|split| [word] + split }
  end.compact.inject([], :+)
end

split_words('helloworld')    #=> [['hello', 'world']]   No hell!

它还会为您提供所有可能的拆分,因此可以避免使用包含penisland等网址的网页<​​/ p>

split_words('penisland')  #=> [['pen', 'island'], [<the_other_solution>]]