data = pd.read_csv('Book1.csv', usecols=['tokenize'])
#ganti jadi 'pre_pro/data1.csv'
#TFIDF VECTORIZER
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import SelectKBest, chi2
data_sample = data[2:4]
tfidf= TfidfVectorizer(smooth_idf=True,norm=None)
X = tfidf.fit_transform(data_sample['tokenize'])
data_sample = pd.DataFrame(X.toarray(), columns=tfidf.get_feature_names())
y = data['tokenize'][2:4]
X_new = SelectKBest(chi2, k=5).fit_transform(X,y)
X_new = pd.DataFrame(X_new.toarray(),y)
在该代码中,X_new仅为我提供具有数字功能的标头,因此我看不到出现的功能。 我想使该行中的数字(0,1,2,3,4)成为要素名称。
| tokenize | 0 | 1 | 2 | 3 | 4 |
|:-----------------------------:|------|------|------|------|------|
| it was so bad the car cant on | 1.40 | 1.40 | 1.40 | 0 | 1.40 |
| car is very good indeed | 0 | 0 | 0 | 1.40 | 0 |