我有以下列表:
words = ['credit', 'debit' , 'american' , 'pay' , 'cards', 'loan']
以及以下表达式:
flattened = [val for sublist in eject for val in sublist]
其中弹出是这样的列表的列表:
[['lons', 'aplication'],
['seem', 'appear', 'credts'],
['cardts', 'debitts', 'lons'],
['targeta', 'débit'],
['recall', 'crheditts'],
['hi',
'need',
'pai',
'lons',
'number',
'call',
'cardss',
'dhvit',
'devit',
'loans']
然后,我有一个函数可以从一组单词中查找最相似的单词:
from nltk import edit_distance
def more_similars(word,lista):
lista = [(pal , edit_distance(word,pal)) for pal in lists]
df = pd.DataFrame(lista)
df.columns = ['palabra','distancia']
return df.sort_values(by = 'distancia')
list(df[df.distancia == 1].palabra)
more_similars('prestamo',flattened)
一旦有了这个,我就会尝试获取单词列表(在此开头称为单词),并获取距离等于1的那些单词的字典。
我尝试过,但是没用
def similar_words(palabras):
for i in enumerate(palabras):
df = mas_parecidas(palabras,list(set(flattened)).palabras)
wrong_written_words = list(df[df.distancia == 1])
list_words = ((palabras + ' ') *
len(palabras_mal_escritas)).split('')
return lista_palabras
{k:v for k, v in zip(wrong_written_words, list_words)}
similar_words(palabras)
我希望得到以下结果:
list_words = {'credts': 'credit' , 'crheditts' : 'credit', 'debitts' : 'debit', 'débit' : 'debit' , 'dhvit': 'debit', 'devit': 'debit', 'pai' : 'pay', 'cardts' : 'cards', 'cardss' : 'cards', 'lons' : 'loan' , 'loans' : 'loan'}