我想在文本文件中将此tfidf进程的结果打印为(word,2.333)。目前,它首先打印所有单词,然后打印分数。我该怎么办呢?我还希望按idf值对文件进行排序,以获得最重要的单词。
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.feature_extraction.text import TfidfVectorizer
results = []
with open("/Users/xyz/Documents/wholedata/X_tr.txt") as f:
for line in f:
results.extend(line.strip().split('\n'))
blob=list(results)
vectorizer= TfidfVectorizer(min_df=1)
X_train_tf=vectorizer.fit_transform(blob)
print(X_train_tf.shape)
idf=vectorizer._tfidf.idf_
p= (vectorizer.get_feature_names(), idf)
with open("tfidf.txt","w") as t:
for x in p:
print>>t, x
答案 0 :(得分:1)
您可以将两个列表压缩为
p = zip(vectorizer.get_feature_names(), idf)
将压缩列表排序为
p.sort(key = lambda t: t[1])
在控制台上打印它们并将它们写入文件。