Question

我正在编写一本关于Scikit-learn的教程书，其中一个部分有这段代码：

NumberFormat nf = new DecimalFormat("#,###.00");
String s = nf.format(d1);
if (s.endsWith(".00")) {
  s = s.substring(0, s.length()-3);
}

当我运行它时，我明白了：

from sklearn.feature_extraction.text import CountVectorizer
corpus = ['The dog ate a sandwich, the wizard transfigured a sandwich, and I ate a sandwich']
vectorizer = CountVectorizer(stop_words='english')
print vectorizer.fit_transform(corpus).todense()

当我应该同时获得这两个：

[[2 1 3 1 1]]

如何更改我的代码以获取实际的字词＆amp;被引导的每个单词的数量，而不仅仅是向量本身？

Answer 1

在安装模型后，您将访问.vocabulary_属性：

>>> vectorizer.vocabulary_
{'ate': 0, 'dog': 1, 'sandwich': 2, 'transfigured': 3, 'wizard': 4}

如何显示被矢量的每个单词的实际数量

1 个答案: