我想从数据集中获得正面句子和否定句子的总数,我对其进行了测试。那么如何计算正面和负面句子的总数呢?
import sklearn
from sklearn.datasets import load_files
moviedirt = r'C:\\Users\\premier\\Downloads\\Reviews\\test'
movie_test = load_files(moviedirt , shuffle=True)
movie_test.target_names
movie_test.data[0:10000]
from sklearn.pipeline import Pipeline # use pipeline for feature extraction and algorithm
pipeline = Pipeline([('vect',CountVectorizer(stop_words='english')),
('tfidf',TfidfTransformer()),('clf',MultinomialNB(fit_prior=False))])
clf = pipeline.fit(movie_train.data , movie_train.target) # classifier is train
predict1 = clf.predict(movie_test.data)
for review, category in zip(movie_test.data , predict1): #use loop
print('%r => %s' % (review, movie_train.target_names[category]))
这是完整的测试代码。 这是输出:
b"Don't hate Heather Graham because she's beautiful, hate her because she's
fun to watch in this movie. Like the hip clothing and funky surroundings, the
actors in this flick work well together. Casey Affleck is hysterical and
Heather Graham literally lights up the screen. The minor characters - Goran
Visnjic {sigh} and Patricia Velazquez are as TALENTED as they are gorgeous.
Congratulations Miramax & Director Lisa Krueger!" => pos
b'I don\'t know how this movie has received so many positive comments. One
can call it "artistic" and "beautifully filmed", but those things don\'t make
up for the empty plot that was filled with sexual innuendos. I wish I had not
wasted my time to watch this movie. Rather than being biographical, it was a
poor excuse for promoting strange and lewd behavior. It was just another
Hollywood attempt to convince us that that kind of life is normal and OK.
From the very beginning I asked my self what was the point of this movie,and
I continued watching, hoping that it would change and was quite disappointed
that it continued in the same vein. I am so glad I did not spend the money to
see this in a theater!' => neg
答案 0 :(得分:0)
import numpy as np
# Number of pos/neg samples in your training set
print(np.unique(movie_train.target, return_counts=True))
# Number of pos/neg samples in your predictions
print(np.unique(predict1, return_counts=True))