我正在使用Python脚本并将结果(使用ntlk计算)写入arff文件。需要进入arff文件的信息是字母和单词(没有数字)。但是,每当我运行我的脚本时,我都会得到一个包含零的arff文件..就像这样:
0,0.0,0.0,0
这是我写给arff的代码片段:
for fileid in corpus.fileids():
cat = str(fileid.split('/')[0])
text = corpus.words(fileid)
text2 = corpus.raw(fileid)
text3 = ngrams(text2, 3)
text4 = ngrams(text2, 4)
lijst = [frequencycount(text, freq)] + [frequencycount(text3, chartrigramfreq)] + [frequencycount(text4, chartetragramfreq)]
merged = list(itertools.chain.from_iterable(lijst))
merged2 = ','.join(merged)
filet.write("%s\n" % merged2)
counter += 1
print counter, fileid, time()-tijd
filet.close()