使用scikit-learn SVM和optunity时{bad input shape'

时间:2015-05-29 21:30:59

标签: python machine-learning scikit-learn svm cross-validation

我正在尝试使用optunity package调整我的SVM模型,我直接复制并通过它的最新示例代码,只需导入要素数组和数据数组

import optunity
import optunity.metrics
import sklearn.svm
import numpy as np

data_path = '/python/Feature'
files = ['A.npy', 'B.npy', 'C.npy']

array = []
labels = []

for i,name in enumerate(files):
    data = np.load('{}/{}'.format(data_path, name))
    for j in range(0,len(data)):
        labels.append(data[j])
        array.append(data)

print len(array)   #=> 1247
print len(labels)  #=> 1247

# score function: twice iterated 10-fold cross-validated accuracy
@optunity.cross_validated(x=data, y=labels, num_folds=10, num_iter=2)
def svm_auc(x_train, y_train, x_test, y_test, C, gamma):
    model = sklearn.svm.SVC(C=C, gamma=gamma).fit(x_train, y_train)
    decision_values = model.decision_function(x_test)
    return optunity.metrics.roc_auc(y_test, decision_values)

# perform tuning
optimal_pars, _, _ = optunity.maximize(svm_auc, num_evals=200, C=[0, 10], gamma=[0, 1])

# train model on the full training set with tuned hyperparameters
optimal_model = sklearn.svm.SVC(**optimal_pars).fit(data, labels)

然而,编译器看起来很不开心,我看了SVM class document来仔细检查输入格式,但是我不理解optunity的编码语法..任何人都可以帮我找出那里出了什么问题吗?真的很感激..(我正在使用'rbf'内核,我尝试添加,但语法出错了,在optunity的例子中很奇怪没有内核选择..)

Traceback (most recent call last):
  File "python/SVM_turning.py", line 26, in <module>
    optimal_pars, _, _ = optunity.maximize(svm_auc, num_evals=200, C=[0, 10], gamma=[0, 1])
  File "/lib/python2.7/site-packages/optunity/api.py", line 181, in maximize
    pmap=pmap)
  File "/lib/python2.7/site-packages/optunity/api.py", line 245, in optimize
    solution, report = solver.optimize(f, maximize, pmap=pmap)
  File "/lib/python2.7/site-packages/optunity/solvers/ParticleSwarm.py", line 257, in optimize
    fitnesses = pmap(evaluate, list(map(self.particle2dict, pop)))
  File "/lib/python2.7/site-packages/optunity/solvers/ParticleSwarm.py", line 246, in evaluate
    return f(**d)
  File "/lib/python2.7/site-packages/optunity/functions.py", line 286, in wrapped_f
    value = f(*args, **kwargs)
  File "/lib/python2.7/site-packages/optunity/functions.py", line 341, in wrapped_f
    return f(*args, **kwargs)
  File "/lib/python2.7/site-packages/optunity/constraints.py", line 150, in wrapped_f
    return f(*args, **kwargs)
  File "/lib/python2.7/site-packages/optunity/constraints.py", line 128, in wrapped_f
    return f(*args, **kwargs)
  File "/lib/python2.7/site-packages/optunity/constraints.py", line 265, in func
    return f(*args, **kwargs)
  File "/lib/python2.7/site-packages/optunity/cross_validation.py", line 386, in __call__
    scores.append(self.f(**kwargs))
  File "/python/SVM_turning.py", line 21, in svm_auc
    model = sklearn.svm.SVC(C=C, gamma=gamma).fit(x_train, y_train)
  File "/lib/python2.7/site-packages/sklearn/svm/base.py", line 138, in fit
    y = self._validate_targets(y)
  File "/lib/python2.7/site-packages/sklearn/svm/base.py", line 441, in _validate_targets
    y_ = column_or_1d(y, warn=True)
  File "/lib/python2.7/site-packages/sklearn/utils/validation.py", line 319, in column_or_1d
    raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape (428, 600)

3 个答案:

答案 0 :(得分:1)

我想我发现了这个问题。在阅读文件时,您正在准备列表labelsarraydata按顺序填充@optunity.cross_validated(x=data, y=labels, num_folds=10, num_iter=2) 。但是,稍后,你这样做:

optimal_model = sklearn.svm.SVC(**optimal_pars).fit(data, labels)

data

因此使用array作为您的数据集,而不是您准备的data。我不知道你从文件中读取的内容的格式,所以我无法确定是什么进行了。但是,labelsarray的维度几乎肯定不会匹配。

以下是labelsimport optunity import optunity.metrics import sklearn.svm import numpy as np #print len(array) #=> 1247 #print len(labels) #=> 1247 # make dummy data array = np.array([[i] for i in range(1247)]) labels = [True] * 100 + [False] * 1147 # score function: twice iterated 10-fold cross-validated accuracy @optunity.cross_validated(x=array, y=labels, num_folds=10, num_iter=2) def svm_auc(x_train, y_train, x_test, y_test, C, gamma): model = sklearn.svm.SVC(C=C, gamma=gamma).fit(x_train, y_train) decision_values = model.decision_function(x_test) return optunity.metrics.roc_auc(y_test, decision_values) # perform tuning optimal_pars, _, _ = optunity.maximize(svm_auc, num_evals=200, C=[0, 10], gamma=[0, 1]) # train model on the full training set with tuned hyperparameters optimal_model = sklearn.svm.SVC(**optimal_pars).fit(array, labels) print(optimal_pars) 可以正常使用的玩具示例:

   void TxtPanno_KeyDown(object sender, KeyEventArgs e)  
   {

   TxtPanno.Text = TxtPanno.Text.ToUpper();  //1 code.

   TxtPanno.Text= CultureInfo.CurrentCulture.TextInfo.ToUpper(TxtPanno.Text); //2 code     

   TxtPanno.Text=Regex.Replace(TxtPanno.Text, "^[A-Z]", m => m.Value.ToUpper());  //3 code. 
   }

哪些输出(示例):

  

{&#39; C&#39;:8.0126953125,&#39; gamma&#39;:0.35791015625}

很抱歉花了这么长时间才回复。

答案 1 :(得分:0)

我没有看到optunity中的默认优化器是什么,但是如果你只是使用网格搜索,你可以在scikit-learn中使用GridSearchCV。

您的示例与optunity中的文档非常相似。您是否尝试过运行那里的确切示例?

答案 2 :(得分:0)

完全不确定这是否是您的错误,但是当我应该使用Numpy的时候使用常规数组时,我得到了这个错误。