Question

我试图编写一个代码，将注释分为正面和负面（0代表负面，1代表正面）。

我有一个熊猫数据框，其中有两列comments和results。我在Python Logistic Regression库中使用过Scikit-Learn（我将尝试其他分类器，例如决策树，SVM，KNN ...），但它给了我一个错误（我想在没有情感分析的情况下执行此操作）。我认为问题是因为我输入的是字符串而不是数字。我的程序应带有注释（字符串值），并对其进行评估，它是0还是1。这是代码：

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model



full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict([input_values]) #adding values for prediction
prediction = prediction[0]
print(prediction)

这是我得到的错误：

ValueError: X has 1 features per sample; expecting 5155

我也尝试过这个：

input_values = ["I love this comment"]

prediction = logistic_regression.predict(cv.fit_transform(input_values)) #adding values for prediction
prediction = prediction[0]

我收到此错误：

ValueError: X has 3 features per sample; expecting ...

Answer 1

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model

full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
                                  "Result":[0,1,0]})

features = full_comment_data["Comment"]
results = full_comment_data["Result"]

cv = CountVectorizer()  
features = cv.fit_transform(features)


logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)

input_values = ["I love this comment"] #This value should be evaluated

prediction = logistic_regression.predict(cv.transform(input_values)) #adding values for prediction
prediction = prediction[0]
print(prediction)

输出：0

使用Scikit-Learn和Python将评论分为正面和负面

1 个答案: