我有一个标签清晰的数据集用于情感分析,并且我使用逻辑回归进行了分类。这是我的代码。
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
xl = pd.ExcelFile('d:/data.xlsx')
df3 = xl.parse("Sheet1")
cl_data, sent = df3['Clean-Reviews'].fillna(' '), df3['Sentiment']
sent_train, sent_test, y_train, y_test = train_test_split(cl_data, sent,
test_size=0.25, random_state=1000)
vectorizer = CountVectorizer()
vectorizer.fit(sent_train)
X_train = vectorizer.transform(sent_train)
X_test = vectorizer.transform(sent_test)
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
当我尝试计算精度,召回率和F度量时:
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
print(f1_score(X_test, y_test, average="macro"))
print(precision_score(X_test, y_test, average="macro"))
print(recall_score(X_test, y_test, average="macro"))
我收到一个错误:
TypeError: len() of unsized object
任何人都可以说出这里的问题吗?在此先感谢
答案 0 :(得分:0)
准确性是在预测值和真实值之间测量的,并且在您的代码中x_test不是预测值。应该是
y_pred = classifier.predict(x_test)
print(f1_score(y_test,y_pred, average="macro"))