我试图编写一个代码,将注释分为正面和负面(0代表负面,1代表正面)。
我有一个熊猫数据框,其中有两列comments
和results
。我在Python Logistic Regression
库中使用过Scikit-Learn
(我将尝试其他分类器,例如决策树,SVM,KNN ...),但它给了我一个错误(我想在没有情感分析的情况下执行此操作) 。我认为问题是因为我输入的是字符串而不是数字。
我的程序应带有注释(字符串值),并对其进行评估,它是0
还是1
。
这是代码:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
"Result":[0,1,0]})
features = full_comment_data["Comment"]
results = full_comment_data["Result"]
cv = CountVectorizer()
features = cv.fit_transform(features)
logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)
input_values = ["I love this comment"] #This value should be evaluated
prediction = logistic_regression.predict([input_values]) #adding values for prediction
prediction = prediction[0]
print(prediction)
这是我得到的错误:
ValueError: X has 1 features per sample; expecting 5155
我也尝试过这个:
input_values = ["I love this comment"]
prediction = logistic_regression.predict(cv.fit_transform(input_values)) #adding values for prediction
prediction = prediction[0]
我收到此错误:
ValueError: X has 3 features per sample; expecting ...
答案 0 :(得分:1)
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
full_comment_data = pd.DataFrame({"Comment":["This is bad", "Good job", "I do not like this"],
"Result":[0,1,0]})
features = full_comment_data["Comment"]
results = full_comment_data["Result"]
cv = CountVectorizer()
features = cv.fit_transform(features)
logistic_regression = linear_model.LogisticRegression(solver="lbfgs")
model = logistic_regression.fit(features, results)
input_values = ["I love this comment"] #This value should be evaluated
prediction = logistic_regression.predict(cv.transform(input_values)) #adding values for prediction
prediction = prediction[0]
print(prediction)
输出:0