RidgeClassifier:解释模型的答案

时间:2018-12-12 10:18:07

标签: python scikit-learn

我已经训练了模型,在某些情况下无法解释答案。

我已经创建了玩具火车样本

makefile

我使用CourtIDGAS Addr_upd 03MS0001 usa, new-york, times square, 1 03MS0001 usa, new-york, times square, 3 03MS0001 usa, new-york, times square, 5 03MS0001 usa, new-york, times square, 7 03MS0001 usa, new-york, times square, 9 03MS0001 usa, new-york, times square, 2 03MS0001 usa, new-york, times square, 4 03MS0001 usa, new-york, times square, 6 03MS0001 usa, new-york, times square, 8 03MS0001 usa, new-york, times square, 10 03MS0001 usa, new-york, times square, 12 03MS0002 usa, new-york, times square, 11 03MS0002 usa, new-york, times square, 13 03MS0002 usa, new-york, times square, 14 03MS0002 usa, new-york, times square, 16 将文本转换为矢量,并使用CountVectorizer预测地址的类别。

RidgeClassifier

当尝试根据火车样本预测水深时,我会得到正确的答案 但是,当我尝试使用其他数据(例如vec = CountVectorizer(token_pattern='(?u)\\b[а-яё0-9\/\-]+\\b', min_df=1) X = vec.fit_transform(df.Addr_upd) Y = df["CourtIDGAS"] clf = RidgeClassifier(random_state=42) clf.fit(X, y) )进行预测时,我得到了类usa, new-york, times square, 18

我无法解释这一点,因为词汇表中的最大数字为16,但是在我看来,这个示例更接近03MS0001

如何解释该分类器的答案? 像这样处理这些数据的正确方法是什么?

0 个答案:

没有答案