如何计算信息增益?

时间:2019-05-14 14:51:26

标签: python pandas scikit-learn information-gain

我想计算数据集中每个单词的information gain,但是我在研究中只达到了此解决方案,所以我申请了。 Mutual information gain

dataset = pd.read_csv("labelled_text.txt", delimiter="\t")

vectorizer = TfidfVectorizer(stop_words = 'english')
X = vectorizer.fit_transform(dataset.Sentence)
Y = dataset['Class']

res_mi = dict(zip(vectorizer.get_feature_names(), mutual_info_classif(X, Y, discrete_features=True)))

它们在sklearn中的计算是否相同?

0 个答案:

没有答案