我试图计算每个单词出现在整个语料库中的次数。
但我得到了错误:
corpus_root = os.path.abspath('../nlp_urdu/out1_data')
mycorpus = nltk.corpus.reader.TaggedCorpusReader(corpus_root,'.*')
noun=[]
count_freq = defaultdict(int)
for infile in (mycorpus.fileids()):
print(infile)
for i in (mycorpus.tagged_sents()):
texts = [word for word, pos in i if (pos == 'NN' )]
noun.append(texts)
count_freq[noun]+= 1
print(count_freq)
我得到的错误是:
count_freq [名词] + = 1
TypeError
:不可用类型:'list'
答案 0 :(得分:0)
texts
是noun
的列表
count_freq
是一个词典,每个键必须noun
(a string
)
corpus_root = os.path.abspath('../nlp_urdu/out1_data')
mycorpus = nltk.corpus.reader.TaggedCorpusReader(corpus_root,'.*')
count_freq = defaultdict(int)
for infile in (mycorpus.fileids()):
print(infile)
for i in (mycorpus.tagged_sents()):
texts = [word for word, pos in i if (pos == 'NN' )]
for noun in texts :
count_freq[noun]+= 1
print(count_freq)