如何检查文字功能的功能重要性?

时间:2019-07-26 14:09:48

标签: python machine-learning scikit-learn classification sentiment-analysis

首先,我仍在研究情感分析中的分类器比较。然后,我想知道每个分类器上每个功能的重要性。

我已经尝试过model.feature_importances_,但是由于我将数据序列向量化,所以我不明白这些功能的重要性是什么。

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

line = pd.read_csv('line_label.csv', encoding = "ISO-8859-1")

x = line.Berita
y = line.Sentimen

xcv = x
xtf = x

countvect = CountVectorizer(analyzer = "word", tokenizer = None, lowercase = None)
xcv = countvect.fit_transform(x).toarray()

X_train, X_test, y_train, y_test = train_test_split(xcv, y, test_size=0.01, random_state=42)

from sklearn.ensemble import RandomForestClassifier 

rf = RandomForestClassifier() 

rf.fit(X_train, y_train) 

rf.score(X_test, y_test)

rf.feature_importances_

它显示

array([2.20854745e-04, 1.24760561e-04, 3.14268988e-03, ...,
   1.71782391e-04, 5.15755286e-05, 2.13065348e-08])

1 个答案:

答案 0 :(得分:0)

使用以下代码:

for feature, importance in zip(countvect.get_feature_names(), rf.feature_importances_):
    print('{}: {}'.format(feature, importance))