将数字特征转换回SelectKBest中的特征名称

时间:2020-03-29 09:39:12

标签: python pandas scikit-learn chi-squared tfidfvectorizer

data = pd.read_csv('Book1.csv', usecols=['tokenize'])
#ganti jadi 'pre_pro/data1.csv'

#TFIDF VECTORIZER
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import SelectKBest, chi2

data_sample = data[2:4]
tfidf= TfidfVectorizer(smooth_idf=True,norm=None)
X = tfidf.fit_transform(data_sample['tokenize'])

data_sample = pd.DataFrame(X.toarray(), columns=tfidf.get_feature_names())
y = data['tokenize'][2:4]
X_new = SelectKBest(chi2, k=5).fit_transform(X,y)
X_new = pd.DataFrame(X_new.toarray(),y)

在该代码中,X_new仅为我提供具有数字功能的标头,因此我看不到出现的功能。 我想使该行中的数字(0,1,2,3,4)成为要素名称。

|            tokenize           | 0    | 1    | 2    | 3    | 4    |
|:-----------------------------:|------|------|------|------|------|
| it was so bad the car cant on | 1.40 | 1.40 | 1.40 |  0   | 1.40 |
| car is very good indeed       | 0    |    0 |    0 | 1.40 | 0    | 

0 个答案:

没有答案