我有一组我想要聚类的维基百科文本。
代码如下:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
#parameters
maximum_features = 1000000
max_intera = 300
#load text file
wiki = pd.read_csv('people_wiki.csv')
#TF-IDF vectorization
vectorizer = TfidfVectorizer(max_features=maximum_features, norm = 'l2', stop_words='english')
tfidf = vectorizer.fit_transform(wiki['text'])
#clustering
kmeans = KMeans(n_clusters=3, random_state=0, init='k-means++', max_iter = max_intera).fit(tfidf)
我想知道每个功能的重量,如此处所示(她0.025她:0.017 .....):
总结:我想要每个特征(单词)的权重,并使5更相关。
'people_wiki.csv'文件在这里:
答案 0 :(得分:1)
尝试使用此解决方案:
print(tfidf.idf_)