我之前问过这个问题,现在已经关闭。因此,我需要再次询问。 我是计算机工程系的硕士研究生,正在努力学习Labelpropagation,而我的问题是关于Labelpropagation。
我有下面的代码,得分很低。我不明白问题出在哪里。我试图将LabelPropagation与TfIdfVectorizer一起使用。但是该代码存在问题。
问题是准确性低。结果约为%28,非常低。我们只有四个类别。我一直希望结果能够具有较高的准确性。我说的对吗?
有人可以帮助我吗?
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.semi_supervised import LabelPropagation
ratiolabeled = 0.5
categories = [
'alt.atheism',
'talk.religion.misc',
'comp.graphics',
'sci.space'
]
data_train = fetch_20newsgroups(subset='train', shuffle=True, random_state=42, remove=('headers', 'footers', 'quotes'),
categories=categories)
data_test = fetch_20newsgroups(subset='test', shuffle=True, random_state=42, remove=('headers', 'footers', 'quotes'),
categories=categories)
vectorizer = TfidfVectorizer(sublinear_tf=True, max_df=0.8, stop_words='english')
y_train, y_test = data_train.target, data_test.target
X_train = vectorizer.fit_transform(data_train.data)
X_test = vectorizer.transform(data_test.data)
labeled_indices, unlabeled_indices = train_test_split(np.arange(len(y_train)), test_size=1-ratiolabeled,
random_state=43, stratify = y_train)
y_train[unlabeled_indices]=-1
lp_model = LabelPropagation(kernel='knn', n_neighbors=21, n_jobs=-1, max_iter=20)
lp_model.fit(X_train.toarray(),y_train)
print("Accuracy = ", lp_model.score(X_train.toarray(),y_train))