您好,我有一个特定的字符串,我正在尝试使用编辑距离来计算其距离,我想查看出现的字符串计数,然后对其进行排序。
str= "Hello"
和一个名为-我正在比较的xfile的txt文件是:
"hola"
"how are you"
"what is up"
"everything good?"
"hola"
"everything good?"
"what is up?"
"okay"
"not cool"
"not cool"
我想制作一个字典,将所有行与xfile进行比较,并给出其编辑距离和计数。 现在,我能够得到它的关键和距离,但还不重要。 有人可以建议我吗?
我的代码是:
data= "Hello"
Utterences = {}
for lines in readFile:
dist= editdistance.eval(data,lines)
Utterances[lines]= dist
答案 0 :(得分:4)
对于每种话语,您都可以使用包含距离和计数的字典:
import editdistance
data = 'Hello'
utterances = {}
xlist = [
'hola',
'how are you',
'what is up',
'everything good?',
'hola',
'everything good?',
'what is up?',
'okay',
'not cool',
'not cool',
]
for line in xlist:
if line not in utterances:
utterances[line] = {
'distance': editdistance.eval(data, line),
'count': 1
}
else:
utterances[line]['count'] += 1
然后,如果您需要按距离或计数排序的语音,则可以使用OrderedDict:
from collections import OrderedDict
sorted_by_distance = OrderedDict(sorted(utterances.items(), key=lambda t: t[1]['distance']))
sorted_by_count = OrderedDict(sorted(utterances.items(), key=lambda t: t[1]['count']))