背景
我正在使用fuzzywuzzy
软件包,并且具有以下示例列表:
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
token_name_list = [['John', 'D', 'Doe'], ['Jane', 'L' , 'More']]
token_text_list = [['Today', 'we', 'found', 'John', 'Doe', 'here', 'and', 'Jon', 'Does', 'car'],
['We', 'also', 'found', 'Johns','sister', 'Jan', 'who', 'is', 'known', 'Jane', 'L', 'More' ]]
目标
我想使用process.extract
中的fuzzywuzzy
函数,该函数比较两个字符串并返回一个分数,例如('John', 100)
-从上方循环浏览两个列表。如果我以非循环方式执行此操作,则它看起来像这样:
#'John' from token_name_list is compared to the 1st list in token_text_list
extract1 = process.extract(token_name_list[0][0],token_text_list[0], limit = 3, scorer = fuzz.ratio)
[('John', 100), ('Jon', 86), ('found', 44)]
#'D' from token_name_list is compared to the 1st list in token_text_list
extract2 = process.extract(token_name_list[0][1],token_text_list[0], limit = 3, scorer = fuzz.ratio)
[[('Doe', 50), ('and', 50), ('Does', 40)]
#'Doe' from token_name_list is compared to the 1st list in token_text_list
extract3 = process.extract(token_name_list[0][2],token_text_list[0], limit = 3, scorer = fuzz.ratio)
[('Doe', 100), ('Does', 86), ('we', 40)]
#'Jane' from token_name_list is compared to the 2nd list in token_text_list
extract4 = process.extract(token_name_list[1][0],token_text_list[1], limit = 3, scorer = fuzz.ratio)
[('Jane', 100), ('Jan', 86), ('Johns', 44)]
#'L' from token_name_list is compared to the 2nd list in token_text_list
extract5 = process.extract(token_name_list[1][1],token_text_list[1], limit = 3, scorer = fuzz.ratio)
[('L', 100), ('also', 40), ('We', 0)]
#'More' from token_name_list is compared to the 2nd list in token_text_list
extract6 = process.extract(token_name_list[1][2],token_text_list[1], limit = 3, scorer = fuzz.ratio)
[('More', 100), ('We', 33), ('who', 29)]
尝试
我尝试了以下方法,但是它没有给我想要的东西
extract_list = []
for token_name in token_name_list:
for name, text in zip(token_name, token_text_list):
extract = process.extract(name,text, limit = 3, scorer = fuzz.ratio)
extract_list.append(extract)
extract_list
[[('John', 100), ('Jon', 86), ('found', 44)],
[('found', 33), ('We', 0), ('also', 0)],
[('and', 57), ('Jon', 57), ('John', 50)],
[('L', 100), ('also', 40), ('We', 0)]]
所需的输出
1)列表列表
extract_list=[ [ [('John', 100), ('Jon', 86), ('found', 44)],
[('Doe', 50), ('and', 50), ('Does', 40)],
[('Doe', 100), ('Does', 86), ('we', 40)] ],
[ [('Jane', 100), ('Jan', 86), ('Johns', 44)],
[('L', 100), ('also', 40), ('We', 0)],
[('More', 100), ('We', 33), ('who', 29)] ] ]
问题
如何实现所需的输出?