我想检查一个句子是否有一个数组项,是一个bigram / trigram ,来自words
数组,并且包含数组项的bigram / trigram在一起句子。
words = ["foo", "bar", "spooky", "rick james"]
sentence = say hello to rick james but not rick and james
由于rick james
是一个数组项,并且在一起,所以
预期的输出应该是
false #say
false #hello
false #to
true #rick <---
true #james <---
false #but
false #not
false #rick <---
false #and
false #james <---
我试过这个
# BASIC EXAMPLE
words = ["foo", "bar", "spooky", "rick james"]
sentence = "something spooky rick this way comes, rick james"
sentence.split.each {|s| puts words.include?(s) }
# OUTPUT #EXPECTED OUTPUT
false #something false
true #spooky <--- true #spooky
false #rick false #rick
false #this false
false #way false
false #comes false
false #rick true #rick <---
false #james true #james <---
要修改哪些内容以包含预期输出
答案 0 :(得分:2)
如果您的域名为bigrams / trigrams,则应将该句子拆分为bigrams / trigrams。
Enumareable#each_cons(n)可能会对你有所帮助(我会把n = 2用于双胞胎)
sentence = "say hello to rick james but not rick and james"
split.each_cons(2) {|e| puts "#{e.join(" ")}" }
# say hello
# hello to
# to rick
# rick james
# james but
# but not
# not rick
# rick and
# and james
如果将bi / trigrams作为一个整体包含在内,则意味着它的词汇也包括在内。
words = ["foo", "bar", "spooky", "rick james"]
sentence.split.each_cons(2) do |e|
puts "#{e} => #{words.include?(e)||words.include?(e.join(" "))}"
end
# ["say", "hello"] => false
# ["hello", "to"] => false
# ["to", "rick"] => false
# ["rick", "james"] => true
# ["james", "but"] => false
# ["but", "not"] => false
# ["not", "rick"] => false
# ["rick", "and"] => false
# ["and", "james"] => false
然后您可以获取这些数组元素并为每个元素返回true / flase
答案 1 :(得分:0)
也许你可以搜索(.include?)或(.include_in?)数组中每个单词的句子。这样你就可以搜索.include?(&#34; rick james&#34;)