当列车集中的功能少于测试集中时加载svmlight样式文件

时间:2014-10-18 16:46:23

标签: python machine-learning scikit-learn

我已将我的scikit-learn版本更新到最新版本0.15.2(更具体地说,我创建了一个新的anaconda环境)。看来,在这个版本中,在 sklearn / datasets / svmlight_format.py (第238行) load_svmlight_files()方法中定义了一个新的ValueError:

elif n_features < n_f:
    raise ValueError("n_features was set to {},"
                     " but input file contains {} features"
                     .format(n_features, n_f))

我的问题是,我加载模型然后,我想加载我的测试数据,所以我在加载测试数据时使用此模型的coef_属性的形状(使用&#34; n_features&#34; load_svmlight_file()方法的属性)。但是如果模型的功能少于测试数据,则加载失败。处理此设置的好方法是什么?我不确定何时添加了此异常,但似乎在0.14.1版本中没有。另外一个问题是,为什么添加了这个例外?

>>> from sklearn.externals import joblib
>>> from sklearn.datasets import load_svmlight_file
>>> clf = joblib.load('mymodel')
>>> print clf.coef_.shape
(11, 9862)
>>> X,y = load_svmlight_file('test_data', n_features=clf.coef_.shape[1] )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../anaconda/envs/test/lib/python2.7/site-packages/sklearn/datasets/svmlight_format.py", line 113, in load_svmlight_file
zero_based, query_id))
File ".../anaconda/envs/test/lib/python2.7/site-packages/sklearn/datasets/svmlight_format.py", line 248, in load_svmlight_files
.format(n_features, n_f))
ValueError: n_features was set to 9862, but input file contains 34912 features

0 个答案:

没有答案