尝试,例外/ If语句组合-缺少结果

时间:2019-01-31 15:17:37

标签: python python-3.x csv exception-handling fuzzywuzzy

我正在将一个大学列表与其他12个列表进行比较,找到模糊的字符串匹配项,然后将所有结果写入CSV。我没有对一个大列表进行模糊字符串匹配,因为我需要知道匹配来自哪个列表。 列表示例:

data = [[1-00000, "MIT"], [1-00001, "Stanford"] ,...]

Data1 = ['MASSACHUSETTS INSTITUTE OF TECHNOLOGY (MIT)'], ['STANFORD UNIVERSITY'],...

在StackOverflow的帮助下,我得到了以下帮助:

for uni in data:
    hit = process.extractOne(str(uni[1]), data10, scorer = fuzz.token_set_ratio, score_cutoff = 90)
    try:
        if float(hit[1]) >= 94:
            with open(filename, mode='a', newline="") as csv_file:
                fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 10})

    except:
        hit1 = process.extractOne(str(uni[1]), data11, scorer = fuzz.token_set_ratio, score_cutoff = 90)
        try:
            if float(hit1[1]) >= 94:
                with open(filename, mode='a', newline="") as csv_file:
                      fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                      writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                      writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 5})

从12个列表中查找直到最后一个列表,除非我包括得分低于94且以“未找到”结尾的列表:

    except:
        hit12 = process.extractOne(str(uni[1]), data9, scorer = fuzz.token_set_ratio)
        try:
            if float(hit12[1]) < 94:
                with open(filename, mode='a', newline="") as csv_file:
                       fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                       writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                       writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 3})
        except:
            with open(filename, mode='a', newline="") as csv_file:
                  fieldnames = ['bwbnr', 'uni_name', 'match', 'points']
                  writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
                  writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': str(hit), 'points': 3})

但是,我只返回2854个结果,而不是原始列表中的3175个结果(所有结果都需要检查并写入新的csv)。

当我将所有列表放在一起并执行extractOne时,我会得到3175个结果:

scored_testdata = []
for uni in data:
     hit = process.extractOne(str(uni[1]), big_list, scorer = fuzzy.token_set_ratio, score_cutoff = 90)
     scored_testdata.append(hit)
print(len(scored_testdata))

我在这里想念什么?我感觉到process.extractOne中返回“ None”的结果由于某种原因而被丢弃。 任何帮助将不胜感激。

完整代码可在here中找到。

1 个答案:

答案 0 :(得分:0)

最后的try-except应该是检查所有列表并执行不带有score_cutoff的extractBest:

except:
    hit12 = process.extractOne(str(uni[1]), big_list, scorer = fuzz.token_set_ratio)
    with open(filename, mode='a', newline="") as csv_file:
           fieldnames = ['bwbnr', 'uni_name', 'match', 'confidence', 'points']
           writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=';')
           writer.writerow({'bwbnr': str(uni[0]), 'uni_name': str(uni[1]), 'match': "CHECK AGAIN " + str(hit12[0]), 'confidence': str(hit12[1]), 'points': 3})