我用我的剧本来完成一个完整的路障。我有一个HTML文档,有几对单词。我必须从HTML文档中提取单词,然后检查单词的相似程度。如果单词在一个编辑中,则它们是可接受的,如果它们不止一个编辑不同,则它们会失败。
(ex: abc – ab; abc – bc; abc – ac = pass,
abc – Abc; abc – acc; abc – abD = pass,
abc – acb = fail,
abc – abc = fail)
我将单词提取到列表中元组内的元组中。我的问题是访问该列表并实际检查单词的相似程度。
[(('Bild', 'mild'), ('bitte', 'Bitte'), ('bitte', 'bitten'), ('Bitte',
'Mitte'), ('Fahne', 'ahne'), ('Schlange', 'Schlangen'), ('windet',
'wendet'), ('sprich', 'sprach'), ('ob', 'Bob'), ('weiße', 'weise'),
('Heidi', 'Hilde'), ('aktiv', 'aktiv'), ('wild', 'Wind'), ('schlagen',
'Schlangen'), ('Küche', 'Mücke'), ('Rücken', 'Küken'), ('Eleonore',
'Elefant'))]
感谢Rakesh,这已经解决了:
pass_score = 0
fail_score = 0
for i in new_pairs[0]:
diff = difflib.ndiff(i[0], i[1])
a, s = 0, 0
for j in diff:
if j.startswith('-'):
s += 1
if j.startswith('+'):
a += 1
if a > 1 or s > 1:
print("FAIL, more than one edit.", i)
fail_score += 1
elif a == 0 and s == 0:
print("FAIL, these are the same word", i)
fail_score += 1
else:
print("PASS, only one edit required.", i)
pass_score += 1
print("Number of PASSING word-pairs:", pass_score)
print("Number of FAILING word-pairs:", fail_score)