Question

我有以下代码：

def sent_dictionary(text):
    token_dictionary = {}
    both_keys = in_both.keys()

    sentences = sent_tokenize(text)
    tokens = word_tokenize(str(sentences))

    for token in tokens:
        if token in both_keys:
            token_dictionary[token]=in_both[token]
        print(token_dictionary)

我有一个名为in_both的字典，其中包含字典中的单词和我的文本，如下所示：

{'think': -0.125, 'seem': 0.0, 'able': 0.25, 'make': 0.0, 'correct': 0.0, 'understand': 0.125, 'words': -0.125, 'appropriate': 0.0, 'confuse': -0.375, 'underactive': -0.625, ... }

Full dictionary here

我希望我的功能可以输入文本，分成句子，然后分成单词。然后，如果单词也在in_both字典中，则将其与in_both中的值一起放入token_dictionary。

当我尝试使用该功能时，会发生以下情况：

sent_dictionary("His father was a successful local businessman and his mother was the daughter of a landowner. Shakespeare is widely regarded as the greatest writer in the English language and the world's pre-eminent dramatist. He is often called England's national poet and nicknamed the Bard of Avon")

输出结果为：

{}
{}
{}
{}
{}
{}
{}
{}
{}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}

我希望输出看起来像：

{'language': 0.0, 'often': 0.25}

现在我想起来了，我正在使用的文本中有多个。所以我不想排除这些，我需要考虑所有代币。

为什么我的函数打印出倍数？

0 个答案: