将源词转换为目标词

时间:2017-08-20 18:32:57

标签: python

我的代码需要一些帮助。我需要将一个输入单词转换为另一个输入单词,一次更改一个字母。目前我的程序执行此操作但效率非常低,并且找不到最短路径。任何帮助,将不胜感激。

import re
def same(item, target):
  return len([c for (c, t) in zip(item, target) if c == t])

def build(pattern, words, seen, list):
  return [word for word in words
                 if re.search(pattern, word) and word not in seen.keys() and
                    word not in list]

def find(word, words, seen, target, path):
  list = []
  for i in range(len(word)):
    list += build(word[:i] + "." + word[i + 1:], words, seen, list)
  if len(list) == 0:
    return False
  list = sorted([(same(w, target), w) for w in list])
  for (match, item) in list:
    if match >= len(target) - 1:
      if match == len(target) - 1:
        path.append(item)
      return True
    seen[item] = True
  for (match, item) in list:
    path.append(item)
    if find(item, words, seen, target, path):
      return True
    path.pop()

fname = 'dictionary.txt'
file = open(fname)
lines = file.readlines()
while True:
  start = input("Enter start word:")
  words = []
  for line in lines:
    word = line.rstrip()
    if len(word) == len(start):
      words.append(word)
  target = input("Enter target word:")
  break

count = 0
path = [start]
seen = {start : True}
if find(start, words, seen, target, path):
  path.append(target)
  print(len(path) - 1, path)
else:
  print("No path found")

编辑:下面是我通过尝试不同方法解决此问题的另一次失败尝试。这次它似乎没有正确循环。

def find(start, words, target): # find function. Word = start word, words =

  start=list(start)
  target=list(target)

  print("Start word is ", start)
  print("Target word is ", target)


  letter = 0
  while start != target:
      if letter == len(target):
          letter = 0
          continue

      elif start[letter] == target[letter]:
          letter = letter + 1
          continue

      else:
          testword = list(start)
          testword[letter] = target[letter]
          testword = ''.join(testword)
          if testword in words:
              start[letter] = target[letter]
              letter = letter + 1
              print(start)
              continue
          else:
              letter = letter + 1
              continue

      letter = letter + 1
      continue




fname = "dictionary.txt"
file = open(fname) # Open the dictionary
lines = file.readlines() # Read each line from the dictionary and store it in lines
while True: # Until ended
  start = input("Enter start word:") # Take a word from the user
  words = [] # Inititialise Array 'words'
  for line in lines: # For each line in the dictionary
    word = line.rstrip() #strip all white space and characters from the end of a string
    if len(word) == len(start):
      words.append(word)

  if start not in words:
      print("Your start word is not valid")
      continue

  target = input("Enter target word:")
  if len(start) != len(target):
      print("Please choose two words of equal length")
      continue

  if target not in words:
      print("Your target word is not valid")
      continue


  break

编辑:这是代码的基本算法。 (两种变体都符合我的目的)。

-input start word
-input target word
- if len(start) = len(target)
       continue
       -check dictionary to see if target and start words are present
       - find what letters are different from the start to target word
       - change one different letter in the start word until start word 
        =target 
        word #after each letter change, output must be valid word in dictionary
The goal is to achieve this in the least amount of steps which is not achieved, the first section of code does this, I think but in a huge amount of steps I know could be far more efficient

2 个答案:

答案 0 :(得分:2)

使用一些预处理对相等长度的单词进行分组,您可以使用 Sun Aug 20 21:13 2017 Time and Allocation Profiling Report (Final) shoot-exe +RTS -N -p -RTS total time = 0.01 secs (7 ticks @ 1000 us, 1 processor) total alloc = 8,067,696 bytes (excludes profiling overheads) COST CENTRE MODULE SRC %time %alloc newBuffer Other src/Other.hs:23:1-33 85.7 49.6 arrayToBS.\.\ Other src/Other.hs:19:5-69 14.3 0.0 arrayToBS Other src/Other.hs:(16,1)-(20,21) 0.0 49.6 第三方库来构建图形,然后使用其networkx算法来检索它。请注意,我使用了大多数* nix系统上提供的默认字典,并将其限制为5个字符或更少的字词。

shortest_path

然后,获得最短的路线,例如:

from collections import defaultdict
import networkx as nx

# Group the words into equal length so we only compare within words of equal length
with open('/usr/share/dict/words') as fin:
    words = defaultdict(set)
    for word in (line.strip() for line in fin if line.islower() and len(line) <= 6):
        words[len(word)].add(word)

graph = nx.Graph()
for k, g in words.items():
    while g:
        word = g.pop()
        matches = {w for w in g if sum(c1 != c2 for c1, c2 in zip(word, w)) == 1}
        graph.add_edges_from((word, match) for match in matches)

答案 1 :(得分:2)

这是一个广度优先搜索,不使用任何第三方模块。我不保证找到最短的解决方案,但似乎有效。 ;)当它找到解决方案时停止,但由于集合的随机顺序,程序的每次运行可能找到针对给定开始的不同解决方案。目标对。

import re

# The file containing the dictionary
fname = '/usr/share/dict/words'

start, target = 'hide', 'seek'

wlen = len(start)
wrange = range(wlen)

words = set()
with open(fname) as f:
    for word in f:
        w = word.rstrip()
        # Grab words of the correct length that aren't proper nouns
        # and that don't contain non-alpha chars like apostrophe or hyphen
        if len(w) == wlen and w.islower() and w.isalpha():
            words.add(w)
print('word set size:', len(words))

# Build a regex to find words that differ from `word` by one char
def make_pattern(word):
    pat = '|'.join(['{}.{}'.format(word[:i], word[i+1:]) for i in wrange])
    return re.compile(pat)

# Find words that extend each chain of words in `seq`
def find(seq):
    result = []
    seen = set()
    for current in seq:
        pat = make_pattern(current[-1])
        matches = {w for w in words if pat.match(w)} - seen
        if target in matches:
            return current + (target,)
        result.extend(current + (w,) for w in matches)
        seen.update(matches)
        words.difference_update(matches)
    seq[:] = result

# Search for a solution
seq = [(start,)]
words.discard(start)
while True:
    solution = find(seq)
    if solution:
        print(solution)
        break
    size = len(seq)
    print(size)
    if size == 0:
        print('No solutions found')
        break

典型输出

word set size: 2360
9
55
199
479
691
('hide', 'hire', 'here', 'herd', 'heed', 'seed', 'seek')

我应该提到所有这些单词链都会占用一些RAM,我会尝试考虑一种更紧凑的方法。但它不应该成为现代机器上的问题,除非你正在使用非常大的单词。