Ruby检查句子是否有来自数组的单词bigram / trigram

时间:2017-05-24 22:00:04

标签: arrays ruby string

我想检查一个句子是否有一个数组项,是一个bigram / trigram ,来自words数组,并且包含数组项的bigram / trigram在一起句子。

words = ["foo", "bar", "spooky", "rick james"]
 sentence = say hello to rick james but not rick and james

由于rick james是一个数组项,并且在一起,所以 预期的输出应该是

false #say
false #hello
false #to
true #rick <---
true #james <---
false #but
false #not
false #rick <---
false #and
false #james <---

我试过这个

# BASIC EXAMPLE
words = ["foo", "bar", "spooky", "rick james"]

sentence = "something spooky rick this way comes, rick james"

sentence.split.each {|s| puts words.include?(s) }

# OUTPUT                      #EXPECTED OUTPUT
false #something              false
true #spooky <---             true  #spooky
false #rick                   false  #rick
false #this                   false
false #way                    false
false #comes                  false
false #rick                   true  #rick <---
false #james                  true  #james <---

要修改哪些内容以包含预期输出

2 个答案:

答案 0 :(得分:2)

如果您的域名为bigrams / trigrams,则应将该句子拆分为bigrams / trigrams。

Enumareable#each_cons(n)可能会对你有所帮助(我会把n = 2用于双胞胎)

sentence = "say hello to rick james but not rick and james"
split.each_cons(2) {|e| puts "#{e.join(" ")}" }

# say hello
# hello to
# to rick
# rick james
# james but
# but not
# not rick
# rick and
# and james

如果将bi / trigrams作为一个整体包含在内,则意味着它的词汇也包括在内。

words = ["foo", "bar", "spooky", "rick james"]   
sentence.split.each_cons(2) do |e| 
  puts "#{e} => #{words.include?(e)||words.include?(e.join(" "))}"
end

# ["say", "hello"] => false
# ["hello", "to"] => false
# ["to", "rick"] => false
# ["rick", "james"] => true
# ["james", "but"] => false
# ["but", "not"] => false
# ["not", "rick"] => false
# ["rick", "and"] => false
# ["and", "james"] => false

然后您可以获取这些数组元素并为每个元素返回true / flase

答案 1 :(得分:0)

也许你可以搜索(.include?)或(.include_in?)数组中每个单词的句子。这样你就可以搜索.include?(&#34; rick james&#34;)