我有以下代码:
def sent_dictionary(text):
token_dictionary = {}
both_keys = in_both.keys()
sentences = sent_tokenize(text)
tokens = word_tokenize(str(sentences))
for token in tokens:
if token in both_keys:
token_dictionary[token]=in_both[token]
print(token_dictionary)
我有一个名为in_both的字典,其中包含字典中的单词和我的文本,如下所示:
{'think': -0.125, 'seem': 0.0, 'able': 0.25, 'make': 0.0, 'correct': 0.0, 'understand': 0.125, 'words': -0.125, 'appropriate': 0.0, 'confuse': -0.375, 'underactive': -0.625, ... }
我希望我的功能可以输入文本,分成句子,然后分成单词。然后,如果单词也在in_both字典中,则将其与in_both中的值一起放入token_dictionary。
当我尝试使用该功能时,会发生以下情况:
sent_dictionary("His father was a successful local businessman and his mother was the daughter of a landowner. Shakespeare is widely regarded as the greatest writer in the English language and the world's pre-eminent dramatist. He is often called England's national poet and nicknamed the Bard of Avon")
输出结果为:
{}
{}
{}
{}
{}
{}
{}
{}
{}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
{'language': 0.0, 'often': 0.25}
我希望输出看起来像:
{'language': 0.0, 'often': 0.25}
现在我想起来了,我正在使用的文本中有多个。所以我不想排除这些,我需要考虑所有代币。