Question

我有一个如下所示的数据集（这里仅显示了6行）。我有700行，我想将其分类为9887、8413等类别。问题在于该模型还对完全无关的输入进行了分类，例如“圣诞树”，“新年”，“耐克是一项很好的运动品牌”，分为给定的9887或8413等类别之一，等等。当输入完全不相关或为空“”时，我希望将其归类为0000。

personInfo                               personID

alicia is from unitedStates              9887
alicia likes to do Yoga                  9887
cooking is one of the hobby of alicia    9887
sam is from Brazil                       8413
sam father is a doctor                   8413
In free time, sam prefers hiking         8413

我的代码：

X_train, X_test, y_train, y_test = train_test_split(df['personInfo'], df['personId'], random_state = 0)
count_vect = CountVectorizer().fit(X_train)
X_train_counts = count_vect.transform(X_train)
tfidf_transformer = TfidfTransformer().fit(X_train_counts)
X_train_tfidf = tfidf_transformer.transform(X_train_counts) 
classificationModel = LinearSVC().fit(X_train_tfidf, y_train) 
filename = 'finalized_model.sav'
pickle.dump(classificationModel, open(filename, 'wb'))
#data_to_be_predicted="alicia has a sister in texas"
filename = 'finalized_model.sav'
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.predict(count_vect.transform([data_to_be_predicted]))
print(result)

这里是一个输入输出：

input: "alicia has a sister in texas"
output: 9887

现在在下面的输入中，我希望将模型归类为0000，因为它不相关，但是将其归类为9887或8413或其他给定类别

input: "christmas tree"
expected output: 0000

input: " "
expected output: 0000

训练机器学习模型以不对错误输入或空输入进行分类

0 个答案: