使用Scikit-Learn和Python将评论分为正面和负面

时间:2019-07-12 11:23:15

标签: python machine-learning scikit-learn

我试图编写一个代码,将注释分为正面和负面(0代表负面,1代表正面)。

我有一个熊猫数据框,其中有两列commentsresults。我在Python Logistic Regression库中使用过Scikit-Learn(我将尝试其他分类器,例如决策树,SVM,KNN ...),但它给了我一个错误(我想在没有情感分析的情况下执行此操作) 。我认为问题是因为我输入的是字符串而不是数字。 我的程序应带有注释(字符串值),并对其进行评估,它是0还是1。 这是代码:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model



full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict([input_values]) #adding values for prediction
prediction = prediction[0]
print(prediction)

这是我得到的错误:

ValueError: X has 1 features per sample; expecting 5155

我也尝试过这个:

input_values = ["I love this comment"]

prediction = logistic_regression.predict(cv.fit_transform(input_values)) #adding values for prediction
prediction = prediction[0]

我收到此错误:

ValueError: X has 3 features per sample; expecting ...

1 个答案:

答案 0 :(得分:1)

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model

full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict(cv.transform(input_values)) #adding values for prediction
prediction = prediction[0]
print(prediction)

输出:0