Logistic回归分类器的精度,召回率和F度量的计算

时间:2019-07-18 06:51:04

标签: python-3.x sentiment-analysis precision-recall

我有一个标签清晰的数据集用于情感分析,并且我使用逻辑回归进行了分类。这是我的代码。

    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.feature_extraction.text import CountVectorizer
    from sklearn.linear_model import LogisticRegression

    xl = pd.ExcelFile('d:/data.xlsx')
    df3 = xl.parse("Sheet1")

    cl_data, sent = df3['Clean-Reviews'].fillna(' '), df3['Sentiment']
    sent_train, sent_test, y_train, y_test = train_test_split(cl_data, sent, 
    test_size=0.25, random_state=1000)

    vectorizer = CountVectorizer()
    vectorizer.fit(sent_train)

    X_train = vectorizer.transform(sent_train)
    X_test  = vectorizer.transform(sent_test)


   classifier = LogisticRegression()
   classifier.fit(X_train, y_train)

当我尝试计算精度,召回率和F度量时:

from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix


print(f1_score(X_test, y_test, average="macro"))
print(precision_score(X_test, y_test, average="macro"))
print(recall_score(X_test, y_test, average="macro"))

我收到一个错误:

TypeError: len() of unsized object

任何人都可以说出这里的问题吗?在此先感谢

1 个答案:

答案 0 :(得分:0)

准确性是在预测值和真实值之间测量的,并且在您的代码中x_test不是预测值。应该是

y_pred =  classifier.predict(x_test)
print(f1_score(y_test,y_pred, average="macro"))