Question

我有一个杂乱的标题列表（比方说1000个）。这些标题我想分析与我创建的少数类型匹配的“关键字”（标题不是模型，但是类型是）。

例如，假设第一个标题字符串为"awesome playlist of house, EDM and ambient"

现在，我说我还有15个Genres，每个都有一个属性name

我的最终目标是我想为这个标题字符串指定流派。通过执行一些字符串规范化，然后使用.include?

，这很容易

但如果有同义词，它就无济于事。例如，我的@ genre.name被称为chill，它应该应用于上面字符串的ambient。同样，我的@ genre.name舞蹈音乐被称为dance，并且应该在上面的字符串中包含EDM（edm =电子舞曲）

所以我喜欢做的是每个类型添加10个左右的同义词，所以它也可以检查这些。

问题是我不确定如何在循环中执行此操作..我猜循环中的循环？

这是我的“单一级别”代码，没有同义词

  def determine_genres(title)
    relevant_genres = []
    @genres.each do |genre|
      if normalize_string(title).include? normalize_string(genre.name)
        relevant_genres << genre.id
      end
    end
    relevant_genres
  end

Answer 1

当你说数组字符串数组时，你肯定是在正确的轨道上。我的结构更像是：

genres = {
    'chill' => ['ambient','mood','chill'],
    'dance' => ['edm','trance','house',]
}

等。所以，哈希中的每个键都是@genre.name的名称，相应的数组是该@genre的所有可能的同义词/子类的列表。

在ruby中，有一个漂亮的数组方法，使用&允许你“交叉”两个数组并找到公共值。像这样：

[1,2,3,4,5] & [0,3,5,6,8]  OUTPUT: [3,5]

在此处查看更多内容：http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-26

如果你将标准化句子和所有关键术语的数组相交，那么你可以说输出的交叉阵列的长度是否> 0，然后有匹配该类型的关键术语，并且该类型是相关的。

所以你要编辑循环（使用上面数组的类型哈希）：

def determine_genres(title)
  relevant_genres = []
  genres.each do |genre, terms|
    intersecting_terms = normalize_string(title) & terms
    if intersecting_terms.length > 0
      relevant_genres << Genre.find_by_name(genre).id
    end
  end
  relevant_genres
end

您还可以在数据库中为Genre模型添加一个字段，用于存储同义词的哈希/数组。

Answer 2

mmm ok

您如何看待这种方法，对于每种类型，您将采用通用名称（如环境），并且对于每个同义词，您将它们与哈希相关联。即

hsh = {"chill" => "ambient",
 "chillout" => "ambient",
 "chilloff" => "ambient",
 "ambient" => "ambient",
 "trance"  => "electronic"
}

#then you just need to check the Hash like this:

puts hsh['chill']  #=> ambient
puts hsh['chillout'] #= ambient
puts hsh['trance'] #=> electronic

缺点是你需要写下所有这些同义词。

Answer 3

对于每个同义词，请创建Genre的实例，其name为同义词，id与代表的实例相同。

我不确定你的结构是否最有效，但使用它，你仍然可以重构它：

def determine_genres(title)
  title = normalize_string(title)
  @genres.select{|genre| title.include? normalize_string(genre.name)}.map(&:id)
end

如何使用字符串数组的数组来匹配另一个字符串？

3 个答案: