Question

我试图计算每个单词出现在整个语料库中的次数。
但我得到了错误：

 corpus_root = os.path.abspath('../nlp_urdu/out1_data')
    mycorpus = nltk.corpus.reader.TaggedCorpusReader(corpus_root,'.*')
    noun=[]
    count_freq = defaultdict(int)
    for infile in (mycorpus.fileids()):
        print(infile)
    for i in (mycorpus.tagged_sents()):
         texts = [word for word, pos in i  if (pos == 'NN' )]
         noun.append(texts)  
         count_freq[noun]+= 1
         print(count_freq)

我得到的错误是：

count_freq [名词] + = 1

TypeError：不可用类型：'list'

Answer 1

texts是noun的列表
count_freq是一个词典，每个键必须noun（a string）

corpus_root = os.path.abspath('../nlp_urdu/out1_data')
    mycorpus = nltk.corpus.reader.TaggedCorpusReader(corpus_root,'.*')
    count_freq = defaultdict(int)
    for infile in (mycorpus.fileids()):
        print(infile)
    for i in (mycorpus.tagged_sents()):
         texts = [word for word, pos in i  if (pos == 'NN' )]
         for noun in texts :             
             count_freq[noun]+= 1

    print(count_freq)

计算整个语料库中所有剑的频率

1 个答案: