我正在尝试对某些评论数据进行感性分析。响应变量是“正面的”#39;或者'否定'。我运行我的模型,我的系数只有1维,我认为它应该是两个因为两个响应变量。任何帮助都会被理解为什么会这样。
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import BernoulliNB
from sklearn import cross_validation
from sklearn.metrics import classification_report
import numpy as np
from sklearn.metrics import accuracy_score
import textblob as TextBlob
#scikit
comments = list(['happy','sad','this is negative','this is positive', 'i like this', 'why do i hate this'])
classes = list(['positive','negative','negative','positive','positive','negative'])
# preprocess creates the term frequency matrix for the review data set
stop = stopwords.words('english')
count_vectorizer = CountVectorizer(analyzer =u'word',stop_words = stop, ngram_range=(1, 3))
comments = count_vectorizer.fit_transform(comments)
tfidf_comments = TfidfTransformer(use_idf=True).fit_transform(comments)
# preparing data for split validation. 60% training, 40% test
data_train,data_test,target_train,target_test = cross_validation.train_test_split(tfidf_comments,classes,test_size=0.2,random_state=43)
classifier = BernoulliNB().fit(data_train,target_train)
classifier.coef_.shape
最后一行打印出来(1L,6L)。我试图弄清楚负面和正面的信息特征,但是因为它的1L它会给我相同的信息来回应。
谢谢!
答案 0 :(得分:0)
In the source code for scikit learn preprocessing module, the LabelBinarizer class implements the one-vs-all scheme for multi-label classification. You can see therein that if only two classes are present, it learns one set of coefficients which predict if a sample is of class "1", and if not, the classifier predicts "0".