Question

使用Ruby，我需要输出一个字典列表，可以通过从源文本中删除字母来形成。

例如，如果我输入源文本“疯狂”，我不仅要获得像“热潮”和“razed”这样的词，其字母顺序相同且字母在源文本中彼此相邻，但也有像“rad”和“red”这样的单词，因为这些单词存在并且可以通过从“疯狂”中删除选择字母来找到，输出单词保留字母顺序。但是，“dare”或“race”等词不应出现在输出列表中，因为“dare”或“race”中字母的字母顺序与“crazed”中的字母顺序不同。（如果“raed”或“crae”是字典中的单词，它们将成为输出的一部分。）

我的想法是以二进制方式浏览源文本

(for "crazed", we'd get: 
000001 = "d"; 
000010 = "e"; 
000011 = "ed"; 
000100 = "z"; 
000101 = "zd"; 
000111 = "zed"; 
001000 = "a"; 
001001 = "ad"; etc.)

并将每个结果与字典中的单词进行比较，但我不知道如何编码，也不知道这是否最有效。这是我将从您的帮助中获益的地方。

此外，源文本的长度是可变的;它不一定是六个字母长（像“疯狂”）。输入可能会大得多（20-30个字符，可能更多）。

我在这里搜索过，发现了关于字谜和关于任何字母顺序的单词的问题，但没有具体说明我正在寻找什么。这在Ruby中是否可行？谢谢。

Answer 1

首先让我们在咀嚼，下跪和删除重复项之后将字典中的单词读入数组（例如，如果字典包含"A"和"a"，那么我在Mac上的字典，我已在下面使用过。）

DICTIONARY = File.readlines("/usr/share/dict/words").map { |w| w.chomp.downcase }.uniq
  #=> ["a", "aa", "aal", "aalii",..., "zyzomys", "zyzzogeton"]  
DICTIONARY.size
  #=> 234371

以下方法生成给定单词的一个或多个字符的所有组合，遵守顺序，并且对于每个字符，加入字符以形成字符串，检查字符串是否在字典中，如果是，将字符串保存到数组中。

要检查字符串是否与字典中的单词匹配，我使用Array#bsearch方法执行二进制搜索。这利用了字典已经按字母顺序排序的事实。

def subwords(word)
  arr = word.chars
  (1..word.size).each.with_object([]) do |n,a|
    arr.combination(n).each do |comb|
      w = comb.join
      a << w if DICTIONARY.bsearch { |dw| w <=> dw }
    end
  end
end

subwords "crazed"
  # => ["c", "r", "a", "z", "e", "d",
  #     "ca", "ce", "ra", "re", "ae", "ad", "ed",
  #     "cad", "rad", "red", "zed",
  #     "raze", "craze", "crazed"]

是的，该特定字典包含所有不会显示为英文单词的字符串（例如"z"）。

另一个例子。

subwords "importance"
  #=> ["i", "m", "p", "o", "r", "t", "a", "n", "c", "e",
  #    "io", "it", "in", "ie", "mo", "mr", "ma", "me", "po", "pa", "or",
  #      "on", "oe", "ra", "re", "ta", "te", "an", "ae", "ne", "ce",
  #    "imp", "ima", "ion", "ira", "ire", "ita", "ian", "ice", "mor", "mot",
  #      "mon", "moe", "man", "mac", "mae", "pot", "poa", "pon", "poe", "pan", 
  #      "pac", "ort", "ora", "orc", "ore", "one", "ran", "tan", "tae", "ace",
  #    "iota", "ione", "iran", "mort", "mora", "morn", "more", "mote",
  #      "moan", "mone", "mane", "mace", "port", "pore", "pote", "pone",
  #      "pane", "pace", "once", "rane", "race", "tane",
  #    "impot", "moran", "morne", "porta", "ponce", "rance",
  #    "import", "impone", "impane", "prance",
  #    "portance",
  #    "importance"]

Answer 2

下面是一个广泛的解决方案集，其中包含可以通过使用字母以任何顺序获得的单词。使用组合查找可能的子词的困难在于组合的排列被遗漏了。例如：从“重要性”中汲取灵感，“ mpa”的组合将在某个时候出现。由于这不是字典单词，因此将被跳过。因此，我们付出了排列“ map”（字典）“重要性”子词的代价。下面是一个广泛的解决方案，可以找到更多可能的词典单词。我同意可以针对速度优化我的方法。

#steps
#split string at ''
#find combinations for n=2 all the way to n=word.size
#for each combination
#find the permutations of all the arrangements
#then
#join the array
#check to see if word is in dictionary
#and it's not already collected
#if it is, add to collecting array

require 'set'
Dictionary=File.readlines('dictionary.txt').map(&:chomp).to_set
Dictionary.size #39501

def subwords(word)
    #split string at ''
  arr=word.split('')

  #excluding single letter words
  #you can change 2 to 1 in line below to select for single letter words too
  (2..word.size).each_with_object([]) do |n,a|

    #find combinations for n=2 all the way to n=word.size
    arr.combination(n).each do |comb|

        #for each combination
        #find the permutations of all the arrangements
      comb.permutation(n).each do |perm|
        
        #join the array
        w=perm.join

        #check to see if word is in dictionary and it's not already collected
        if Dictionary.include?(w) && !a.include?(w)

            #if it is, add to collecting array
          a<<w
        end
      end
    end
  end
end

p subwords('crazed')
#["car", "arc", "rec", "ace", "cad", "are", "era", "ear", "rad", "red", "adz", "zed", "czar", "care", "race", "acre", "card", "dace", "raze", "read", "dare", "dear", "adze", "daze", "craze", "cadre", "cedar", "crazed"]
p subwords('battle')
#["bat", "tab", "alb", "lab", "bet", "tat", "ate", "tea", "eat", "eta", "ale", "lea", "let", "bate", "beat", "beta", "abet", "bale", "able", "belt", "teat", "tale", "teal", "late", "bleat", "table", "latte", "battle", "tablet"]

使用Ruby

2 个答案: