我有一个30行的小文本文件,每行有两个相似的单词。我需要计算每行上两个单词之间的levenshtein distance。我还需要在计算距离时使用memoize函数。我对Python和算法一般都很陌生,所以这对我来说非常困难。我打开文件并正在阅读,但我无法弄清楚如何将这两个单词中的每一个分配给变量'a'和& 'b'来计算距离。
这是我当前的脚本,它现在只打印文档:
txt_file = open('wordfile.txt', 'r')
def memoize(f):
cache = {}
def wrapper(*args, **kwargs):
try:
return cache[args]
except KeyError:
result = f(*args, **kwargs)
cache[args] = result
return result
return wrapper
@memoize
def lev(a,b):
if len(a) > len(b):
a,b = b,a
b,a = a,b
current = range(a+1)
for i in range(1,b+1):
previous, current = current, [i]+[0]*n
for j in range(1,a+1):
add, delete = previous[j]+1, current[j-1]+1
change = previous[j-1]
if a[j-1] != b[i-1]:
change = change + 1
current[j] = min(add, delete, change)
return current[b]
if __name__=="__main__":
with txt_file as f:
for line in f:
print line
以下是文本文件中的几个字,所以你们都明白了:
archtypes,archetypes
propietary,proprietary
认识,识别
exlude,排除
龙卷风,龙卷风 发生了,发生了vacinity,附近
这里是脚本的更新版本,但仍然没有功能但更好:
class memoize:
def __init__(self, function):
self.function = function
self.memoized = {}
def __call__(self, *args):
try:
return self.memoized[args]
except KeyError:
self.memoized[args] = self.function(*args)
return self.memoized[args]
@memoize
def lev(a,b):
n, m = len(a), len(b)
if n > m:
a, b = b, a
n, m = m, n
current = range(n + 1)
for i in range(1, m + 1):
previous, current = current, [i] + [0] * n
for j in range(1, n + 1):
add, delete = previous[j] + 1, current[j - 1] + 1
change = previous[j - 1]
if a[j - 1] != b[i - 1]:
change = change + 1
current[j] = min(add, delete, change)
return current[n]
if __name__=="__main__":
for pair in open("wordfile.txt", "r"):
a,b = pair.split()
lev(a, b)
答案 0 :(得分:2)
假设问题是将单词传递给lev
。假设你的wordfile是这样的 -
bat, man
cat, goat
foo, bar
你可以这样做 -
if __name__ == '__main__':
for pair in open("wordfile", "r"):
# first, remove all spaces, then break around the comma
a,b = pair.replace(' ', '').split(',')
# pass these words to lev
lev(a, b)
答案 1 :(得分:0)
在Abhishek的回答和评论的帮助下,我找到了这个问题的答案。这是最终运行的脚本,以防其他人需要它:
def memoize(f):
cache = {}
def wrapper(*args, **kwargs):
try:
return cache[args]
except KeyError:
result = f(*args, **kwargs)
cache[args] = result
return result
return wrapper
@memoize
def lev(a,b):
n, m = len(a), len(b)
if n > m:
a, b = b, a
n, m = m, n
current = range(n + 1)
for i in range(1, m + 1):
previous, current = current, [i] + [0] * n
for j in range(1, n + 1):
add, delete = previous[j] + 1, current[j - 1] + 1
change = previous[j - 1]
if a[j - 1] != b[i - 1]:
change = change + 1
current[j] = min(add, delete, change)
return current[n]
if __name__=="__main__":
lev = Counter(lev)
word_file = open('wordfile.txt', 'r')
for line in word_file:
a,b = line.split()
print a,b, lev(a, b)