朴素贝叶斯的重要功能

时间:2019-12-30 17:31:50

标签: python pandas numpy naivebayes

我试图为朴素的贝叶斯实现选择前20个功能。我正在尝试使用前20个功能的索引来获取这些功能的名称。 代码:

neg_class_prob_sorted = multi_nb_best_alpha_tf_idf.feature_log_prob_[:][0].argsort()# This returns the sorted probability index
neg_class_prob_sorted_prob=multi_nb_best_alpha_tf_idf.feature_log_prob_[:][0]# This returns the probabilties
pos_class_prob_sorted = multi_nb_best_alpha_tf_idf.feature_log_prob_[:][1].argsort()# Same thing for positive class
pos_class_prob_sorted_prob=multi_nb_best_alpha_tf_idf.feature_log_prob_[:][1]

neg_class_features=np.take(multi_nb_best_alpha_tf_idf_vec.get_feature_names(), neg_class_prob_sorted[-20:])#Returns the top 20 feature names
neg_class_features_list=neg_class_features.tolist() 
pos_class_features=np.take(multi_nb_best_alpha_tf_idf_vec.get_feature_names(), pos_class_prob_sorted[-20:])#Returns the top 20 feature names
pos_class_features_list=pos_class_features.tolist()
top_class_features=neg_class_features_list+pos_class_features_list#creating top 20 features from all the classes
top_class_features=set(top_class_features)# Removing the common classes from all the classes
top_class_features=list(top_class_features)

但是我得到的错误是索引x从0到y超出范围。与数据集中实际特征的数量相比,我从countvectorizer.feature_names()中得到的特征名称更少。 这是我得到的错误:

IndexError: index 1287 is out of bounds for axis 0 with size 1282

我在neg_class_features和pos_class_features遇到此错误 请帮帮我,谢谢

0 个答案:

没有答案