Question

我有一个由单词组成的source.txt文件。每个单词都在一个新的行中。

apple
tree
bee
go
apple
see

我还有一个taget_words.txt文件，其中每个单词也在一行中。

apple
bee
house
garden
eat

现在我必须搜索源文件中的每个目标字。如果找到目标词，例如apple，目标词的词典条目以及前3个和后3个词中的每一个都应该被制作。在示例中，那将是

words_dict = {'apple':'tree', 'apple':'bee', 'apple':'go'}

如何通过创建和填充字典来考虑python，以便在source_file中的条目之前和之后考虑这3个单词？我的想法是使用列表，但理想情况下，代码应该非常高效和快速，因为文件包含数百万字。我想，对于列表，计算速度非常慢。

from collections import defaultdict 

words_occ = {}
defaultdict = defaultdict(words_occ)
with open('source.txt') as s_file, open('target_words.txt') as t_file:
    for line in t_file:
        keys = [line.split()]
    lines = s_file.readlines()
    for line in lines:
        s_words = line.strip()
        # if key is found in s_words
        # look at the 1st, 2nd, 3rd word before and after 
        # create a key, value entry for each of them

稍后，我必须计算每个键，值对的出现次数并将数字添加到单独的字典中，这就是我开始使用defaultdict的原因。

对于上述代码的任何建议，我会很高兴。

Answer 1

您将面临的第一个问题是您对dicts缺乏了解。每个密钥只能出现一次，所以如果你要求口译员给你你给你的那个价值，你可能会感到惊讶：

>>> {'apple':'tree', 'apple':'bee', 'apple':'go'}
{'apple': 'go'}

问题是只能有一个与键'apple'相关联的值。

您似乎正在搜索合适的数据结构，但StackOverflow用于改进或修复有问题的代码。

创建多维字典来计算单词出现次数

1 个答案: