Python3-拼写检查txt文件-替换值并保留格式

时间:2019-05-07 10:09:45

标签: python python-3.x

考虑以下.txt文件:myfile.txt

Box-No.: DK10-95794
Total Discounts          USD 1,360.80
Totat:                   usp 529.20

如您所见,在上面的文本文件中,有两个错误totatusp(应为totalusd

现在,我正在使用基于SymSpell构建的Python软件包,称为SymSpellPy。这样可以检查一个单词并确定其拼写是否正确。

这是我的Python脚本:


    # maximum edit distance per dictionary precalculation
    max_edit_distance_dictionary = 2
    prefix_length = 7

    # create object
    sym_spell = SymSpell(max_edit_distance_dictionary, prefix_length)

    # load dictionary
    dictionary_path = os.path.join(
        os.path.dirname(__file__), "Dictionaries/eng.dictionary.txt")

    term_index = 0  # column of the term in the dictionary text file
    count_index = 1  # column of the term frequency in the dictionary text file


    with open("myfile.txt", "r") as file:
        for line in file:
            for word in re.findall(r'\w+', line):
                # word by word
                input_term = word

                # max edit distance per lookup
                max_edit_distance_lookup = 2
                suggestion_verbosity = Verbosity.CLOSEST  # TOP, CLOSEST, ALL
                suggestions = sym_spell.lookup(input_term, suggestion_verbosity,
                                               max_edit_distance_lookup)

                # display suggestion term, term frequency, and edit distance
                for suggestion in suggestions:
                    word = word.replace(input_term, suggestion.term)

                    print("{}, {}". format(input_term, word))

在我的文本文件上运行上述脚本,会得到以下输出结果:

Total, Total
USD, USD
Totat, Total

如您所见,它正确捕获了最后一个单词totat => total

我的问题是-如何找到拼写错误的单词并在txt文件中更正

0 个答案:

没有答案