我有3个句子,包含正词和负词,并且已经应用了必要/标准的预处理技术。将所有这三个句子/列表及其对应的句子标记列表与Tfidf加权w2vector一起馈入预测函数,正确和否定句子的60%预测都是正确的。
但是,当发送单个或一对一的产品ID评论以预测功能时,尽管每个句子/列表中的每个单词都带有微笑,好听和令人惊讶的单词,但它们的极性却被预测为负面。
我想知道,当所有3条评论都发送一个镜头来预测功能时,它可以预测正面和负面评论,但是同一条评论所发送的所有评论中的3条评论被一一预测为负面。
有人可以告诉我这里缺少什么吗?
实际审查
smile
Did amazing on my husband. but the medication test was inappropriate.
Overall experience is wonderful.
存储在列表列表中的带格式评论
[['smile'],
['amazing', 'husband', 'medication', 'test', 'inappropriate'],
['overall', 'experience', 'good']]
所有3条评论的输出。
polarity_cnt_logistic: [1 0 1]
->here first & third '1's refers to formatted reviews of first & third.' O' refers to the second review.
每个评论的预测输出
First Review:
polarity_cnt_logistic :[0]
printed below values just to verify inputs to predict function.
length of lst_sent: 1
Review: ['smile']
List of Sentence: ['smile']
Second Review:
polarity_cnt_logistic: [0 0 0 0 0 0]
printed below values just to verify inputs to predict function.
length of lst_of_sentance: 5
Actual Review: ['Did amazing on my husband. but the medication test was inappropriate.']
Formatted Review: ['amazing', 'husband', 'medication', 'test', 'inappropriate']
for idx,row in test_revs_df.iterrows():
tf_idf = vectorizer_tst.transform(row[["formated_reviews"]])
tfidf_features = vectorizer_tst.get_feature_names()
feature_counts =tf_idf.sum(axis=0).A1
feature_dict = dict(zip(list(tfidf_features),feature_counts))
# call avg_tf_idf_word2vec function
text_feature_avg_tf_idf_w2v=avg_tf_idf_w2vec(tfidf_features,list_of_sentance_tst[idx])
#predcit polarity for each review
polarity_cnt_logistic=trained_model.predict(text_feature_avg_tf_idf_w2v)
每个格式的评论的预期预测输出。
Formatted Review: ['amazing', 'husband', 'medication', 'test', 'inappropriate']
polarity_cnt_logistic=[1 1 1 1 0 ] <- expected predicted output
Formatted Review=['smile']
polarity_cnt_logistic=[1]<- expected predicted output