做一些情绪分析,我试图使用逻辑回归来获得特征重要性。我在此处 (How to get feature importance in logistic regression using weights?) 找到了有关如何执行此操作的参考,但在实施时出现错误,我不知道为什么以及如何解决。
有人可以帮我吗?
这是我的代码。
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import StandardScaler
## Creating Training data
Independent_var = df_final.tweet # the features
Dependent_var = df_final.sent_binary # the sentiment (positive, negative, neutral)
# Logistic regression
cv = CountVectorizer(min_df=2, max_df=0.50, ngram_range = (1,2), max_features=50)
text_count_vector = cv.fit_transform(Independent_var)
#standardized_data = StandardScaler(with_mean=False).fit_transform(text_count_vector)
feature_names = np.array(cv.get_feature_names())
#feature_names
## Splitting in the given training data for our training and testing
X_tr, X_test, y_tr, y_test = train_test_split(text_count_vector, Dependent_var, test_size=0.3, random_state=225)
LogReg = LogisticRegression(solver='lbfgs', multi_class='multinomial')
LogReg_clf = LogReg.fit(X_tr, y_tr)
#coefs = np.abs(LogReg_clf.coef_)
coefs = LogReg_clf.coef_
#get the sorting indices
sorted_index = np.argsort(coefs)[::-1]
# check if the sorting indices are correct
print(coefs[sorted_index])
#get the index of the top-20 features
top_20 = sorted_index[:20]
#get the names of the top 20 most important features
print(feature_names[top_20])
我得到的错误:
IndexError Traceback (most recent call last)
<ipython-input-103-b566f1c5a21c> in <module>
22 print(sorted_index)
23 # check if the sorting indices are correct
---> 24 print(coefs[sorted_index])
25
26 #get the index of the top-20 features
IndexError: index 23 is out of bounds for axis 0 with size 3