Question

我有两个字符串数组。一个数组中的字符串可能是另一数组中的字符串子集。我需要找出一个数组中的所有字符串是另一数组中的字符串的子字符串

示例：

arr1 = ["firestorm", "peanut", "earthworm"]
arr2 = ["fire", "tree", "worm", "rest"]

结果：

res = ["fire","worm", "rest"]

我的解决方案在下面提到。但是需要很多时间。我必须处理成千上万的单词。

解决方案：

res =[]
arr1.each do |word1|
  arr2.each do |word2|
   if word1.include? word2
     res << word2
   end
  end
end

请给我建议更快的方法

Answer 1

很不幸，我们不知道您的解决方案。

但是Array比String占用更多的内存空间。这样就可以转换它。

arr1 = ["firestorm", "peanut", "earthworm"]
arr2 = ["fire", "tree", "worm", "rest"]

arr1 = arr1.join(',')

然后

res = arr2.select { |word| arr1.include?(word) } #=> ["fire", "worm", "rest"]

或

res = arr2.select { |word| arr1.match?(word) } #=> ["fire", "worm", "rest"]

或

res = arr2.select { |word| arr1.match(word) } #=> ["fire", "worm", "rest"]

Answer 2

由于术语重叠，据我所知，您需要对此进行暴力破解。

def matched(find, list)
  list.flat_map { |e| find.flat_map { |f| e.scan(f) } }.uniq
end

在实践中：

matched(%w[ fire tree worm rest ], %w[ firestorm peanut earthworm ])
# => ["fire", "rest", "worm"]

此处%w用于表示列表的快速方法。

这里是使用scan和flat_map的近似值：

def matched(find, list)
  rx = Regexp.union(find)

  list.flat_map { |e| e.scan(rx) }.uniq
end

在使用Rexexp.union的地方，您可以制作一个比单个测试运行得更快的正则表达式。

不太准确的地方：

matched(%w[ fire tree worm rest ], %w[ firestorm peanut earthworm ])
# => ["fire", "worm"]