Question

我正在尝试解决以下问题：

使用state_union语料库阅读器阅读国情咨文地址的文本。计算每个文档中男性，女性和人的出现次数。随着时间的推移，这些词语的用法发生了什么变化？

我的问题：我知道计算单词出现的所有函数都会产生错误消息。

以下是一个例子：

from nltk.corpus import state_union

len(state_union)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-cb4ef2bb9247> in <module>()
----> 1 len(state_union)

TypeError: object of type 'LazyCorpusLoader' has no len()

state = state_union

len(state)

Answer 1

与错误状态一样，state_union没有len()。您可以将state_union.raw()用于原始数据，state_union.words()用于单词，state_union.sents用于句子。

len(state_union.words())会给你一些单词。

NLTK book ch.2.8＃4，LazyCorpusLoader

1 个答案: