使用权重在逻辑回归中获取特征重要性时出现 IndexError

时间:2021-04-14 19:36:36

标签: python

做一些情绪分析,我试图使用逻辑回归来获得特征重要性。我在此处 (How to get feature importance in logistic regression using weights?) 找到了有关如何执行此操作的参考,但在实施时出现错误,我不知道为什么以及如何解决。

有人可以帮我吗?

这是我的代码。

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import StandardScaler


## Creating Training data
Independent_var = df_final.tweet # the features
Dependent_var = df_final.sent_binary # the sentiment (positive, negative, neutral)

# Logistic regression
cv = CountVectorizer(min_df=2, max_df=0.50, ngram_range = (1,2), max_features=50)
text_count_vector = cv.fit_transform(Independent_var)
#standardized_data = StandardScaler(with_mean=False).fit_transform(text_count_vector)

feature_names = np.array(cv.get_feature_names())
#feature_names

## Splitting in the given training data for our training and testing
X_tr, X_test, y_tr, y_test = train_test_split(text_count_vector, Dependent_var, test_size=0.3, random_state=225)


LogReg = LogisticRegression(solver='lbfgs', multi_class='multinomial')
LogReg_clf = LogReg.fit(X_tr, y_tr)

#coefs = np.abs(LogReg_clf.coef_)
coefs = LogReg_clf.coef_


#get the sorting indices
sorted_index = np.argsort(coefs)[::-1]
# check if the sorting indices are correct
print(coefs[sorted_index])

#get the index of the top-20 features
top_20 = sorted_index[:20]

#get the names of the top 20 most important features
print(feature_names[top_20])

我得到的错误:

IndexError                                Traceback (most recent call last)
<ipython-input-103-b566f1c5a21c> in <module>
     22 print(sorted_index)
     23 # check if the sorting indices are correct
---> 24 print(coefs[sorted_index])
     25 
     26 #get the index of the top-20 features

IndexError: index 23 is out of bounds for axis 0 with size 3

0 个答案:

没有答案