Question

我正在这样使用countVectorizer：

from sklearn.feature_extraction.text import CountVectorizer  
vectorizer = CountVectorizer(max_features=200, min_df=2, max_df=0.7, 
stop_words=stopwords.words('arabic'))  
X = vectorizer.fit_transform(X).toarray()

现在此代码会将字符串转换为二进制，然后我将训练数据。但是，我现在有一些小的测试数据。如何将其转换为二进制形式，以便可以进行实际比较？

Answer 1

只需使用适合训练数据的vectorizer，即可将文本转换为训练模型所期望的格式：

test_vectors = vectorizer.transform(test_text_data)

Answer 2

从要素整数索引到要素名称的数组映射：

vectorizer.get_feature_names()

经过一些培训后如何使用countVectorizer测试新数据

2 个答案: