查找一个文件中另一个文件中记录的匹配项

时间:2018-12-26 00:20:54

标签: python file match

我有一个包含单词的文件和另一个包含定义的“字典”文件。我想找到字典中每个单词的定义,并将其写到文件中。

我在这里看到了使用Unix / Linux命令的答案,但是我在Windows上,决定改用python解决,并提出了可行的解决方案,但想知道是否有更好的方法。

with open('D:/words_and_definitions.txt', 'w') as fo:
    dict_file = open('D:/Oxford_English_Dictionary-orig.txt','r')
    word_file = open('D:/Words.txt','r')
    definitions = dict_file.readlines()
    words = word_file.readlines()
    count = 1;
    for word in words:
        findStatus='not_found'
        word = word.strip() + ' '
        for definition in definitions:
            if re.match(r''+word, definition) is None:
                count += 1
            else:
                fo.write(definition)
                findStatus='found'
                break
        if findStatus == 'not_found':
            fo.write(word+' ****************no definition' + '\n')
print("all done")

word_file不是按字母顺序排序,dict_file是。

word_file中的样本

Inane
Relevant
Impetuous
Ambivalent
Dejected
Postmortem
Incriminate

dict_file中的样本

Ambiguity -n. the condition of admitting of two or more meanings, of being understood in more than one way, or of referring to two or more things at the same time 
Ambiguous  adj. 1 having an obscure or double meaning. 2 difficult to classify.  ambiguity n. (pl. -ies). [latin ambi- both ways, ago drive]
Ambit  n. Scope, extent, or bounds. [latin: related to *ambience]
Ambition  n. 1 determination to succeed. 2 object of this. [latin, = canvassing: related to *ambience]
Ambitious  adj. 1 full of ambition or high aims. 2 (foll. By of, or to + infin.) Strongly determined.
Ambivalence  n. Coexistence of opposing feelings.  ambivalent adj. [latin ambo both, *equivalent]
Ambivalent adj. having opposing feelings, undecided
Amble  —v. (-ling) move at an easy pace. —n. Such a pace. [latin ambulo walk]

1 个答案:

答案 0 :(得分:1)

您是否尝试过使用字典来查找定义?如果定义文件太大,肯定会出现一些内存问题,但在您的情况下就足够了。那可以给出一个简单的解决方案:

import re

definition_finder = re.compile(r'^(\w+)\s+(.*)$')

with open('Oxford_English_Dictionary-orig.txt') as dict_file:
    definitions = {}
    for line in dict_file:
        definition_found = definition_finder.match(line)
        if definition_found:
            definitions[definition_found.group(1)] = definition_found.group(2)

with open('Words.txt') as word_file:
    with open('words_and_definitions.txt', 'w') as fo:
        input_lines = (line.strip("\n") for line in word_file)
        for line in input_lines:
            fo.write(f"{line} {definitions.get(line, '****************no definition')}\n")

您可以使用更紧凑的方式定义定义。这样会得出:

import re

definition_finder = re.compile(r'^(\w+)\s+(.*)$')

with open('Oxford_English_Dictionary-orig.txt') as dict_file:
    definitions_found = (definition_finder.match(line) for line in dict_file) 
    definitions = dict(definition_found.groups() for definition_found
                       in definitions_found if definition_found)

with open('Words.txt') as word_file:
    with open('words_and_definitions.txt', 'w') as fo:
        input_lines = (line.strip("\n") for line in word_file)
        for line in input_lines:
            fo.write(f"{line} {definitions.get(line, '****************no definition')}\n")

如果定义文件确实太大,则可以考虑使用sqlite3模块之类的数据库。